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Documentation Institute to insure indefinite avail- 
ability. 

3. Prepares at least 100 mimeographed copies of 
the full report, which the author will send without 
charge to all who request it as long as the supply 
lasts. 


specification 


4. Agrees not to submit the full report to another 
journal of general circulation. 
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Brief Report. The Brief Report should give 
a clear, condensed summary of the procedure 
of the study and as full an account of the re- 
sults as space permits. 

To insure that the Brief Report will be no 
longer than one printed page, its typescript, 
including all matter except the title and the 


author’s lines, must not exceed 85 lines av- 
eraging 42 characters and spaces in length. 
Set the typewriter margins for short lines of 
42 characters, which are 3.5 inches long in 
elite typing, and 4.2 inches long in pica. 

The manuscript of the Brief Report must 
be double spaced throughout. Except for its 
short lines, it follows the standard style of 
the 1957 revision of the APA Publication 
Manual. Headings, tables, and references are 
avoided or, if essential, must be counted in 
the 85 lines. Each Brief Report must be ac- 
companied by a footnote in the style below, 
which is typed on a separate sheet and not 
counted in the 85-line quota: 


1An extended report of this study may be ob- 
tained without charge from John Doe (giving the 
author’s full name and address) or for a fee from 
the American Documentation Institute. Order Docu- 
ment No. —— from ADI Auxiliary Publications 
Project, Photoduplication Service, Library of Con- 
gress; Washington 25, D. C., remitting in advance 
$— for microfilm or $— for photocopies. Make 
checks payable to: Chief, Photoduplication Service, 
Library of Congress. 


Extended report. Because the extended re- 
port is intended for photoduplication, and is 
not copy to be sent to a printer, its style 
should differ in several ways from that of 
other manuscripts: (@) The extended report 
should be typed with single spacing for 
economy in duplication. (6) Tables and fig- 
ures should be placed adjacent to the text 
which refers to them. A caption should be 
typed below each figure. (c) Footnotes should 
be typed at the bottom of the page on which 
reference is made to them. In other respects, 
the full report is prepared in the style speci- 
fied by the Publication Manual. 





Journal of Consulting Psychology 
19¢ \ N 463-4 


FOOD-RELATED RESPONSES TO AMBIGUOUS 
STIMULI AS A FUNCTION OF HUNGER 
AND EGO STRENGTH’ 


SEYMOUR EPSTEIN 


Universit) 


In a previous study on hunger and thematic 
apperception (Epstein & Smith, 1956) a theo- 
retical model was proposed which could re- 
solve the discrepancies in a wide variety of 
studies on the influence of hunger upon food- 
related responses. According to this model 
drive can produce an increment, a decrement, 
or no change in number of drive-related re- 
sponses. The model integrates Miller’s (1951) 
model of displacement and conflict with the 
psychoanalytic model of thinking (Rapaport, 
1951). Briefly, it is assumed that there are 
fundamental with 

drive state: an drive-oriented 
expressive tendency, and a reality-oriented 
inhibitory tendency. It is further assumed 
that the gradient of expression as a function 
of increasing stimulus-relevance is less steep 
than the gradient of inhibition. It follows that 
for stimuli of relatively low relevance, in- 
creases in drive should result in an increase 
in number or intensity of drive-relevant re- 
while for stimuli of high relevance 
the reverse should occur. In this respect, re- 
search on both the sex drive (Leiman & Ep- 
stein, 1961) and the hunger drive (Epstein & 
Smith, 1956) have indicated the importance 
of attending to the drive-relevance of the 
stimulus. It is further assumed that response- 
produced cues function in a similar manner to 
stimulus-produced cues, and that it is as im- 
portant to consider a dimension of response- 
relevance as of this 


two processes associated 


every autistic 


sponses, 


stimulus-relevance. In 


1 This paper was presented, in part, at the Eastern 
Association, New York, April 196 

The study is part of a project on the measurement 
of drive and conflict which is being supported by 
Grant M-1293 from the National Institute of Men- 
tal Health, United States Public Health Service. Ap- 
preciation is expressed for the assistance of Jane 
Nelson, Alan Leiman, and Morton Berger. 


Psychological 


f Massachuset 


drive-relevant latent 
i.e., thoughts and images, are presumed to 
produce cues which favor inhibition. Finall 
it is assumed that there are .individual dif- 
ferences in tendency to inhibit drive repre 
sentatives which can 
cept of ego strength. 

The purpose of the present study was to 
investigate different categories of responses, 
as derived from the theoretical ap- 
proach, and to determine whether a measure 
of ego strength could be related to the influ- 
ence of drive 


connection, responses 


be related to the con- 


above 


drive-related 
The Rorschach test was investigated, for, de- 
spite the limitation of a small yield of food 
responses, 


upon responses 


interest 
in terms of the model described, allows these 
responses to be obtained in a situation where 
stimulus characteristics play a minimal role, 
and affords a possible measure of ego strength. 
The following hypotheses were tested: 


it offers several scores of 


1. With increasing hunger there is an in- 


crease in food-related responses up to a point, 
followed by a decrease. This hypothesis fol- 
lows from the assumption that strong cues, 


whether stimulus-, response-, or drive-pro- 
duced favor the inhibitory process. It is con- 
sistent with findings in other studies (Levine, 
Chein, & Murphy, 1942; Wispé, 1954). 

2. Food-related responses of low drive-rele- 
vance are more strongly associated with hun- 
ger than are food-related responses of high 
drive-relevance. This hypothesis follows from 
the assumption that responses that produce 
strong cues are more readily inhibited than 
responses that produce weak cues. 

3. Accurate and popular food responses are 
more strongly associated with hunger than 
are inaccurate and unusual food responses. 
This hypothesis follows from the assumption 


$163 
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that the reality-oriented inhibitory process is 
increasingly dominant at higher drive states, 
at least up to the point of intense drive at 
which a breakdown of controls occurs. This 
view is consistent with findings in a previous 
study (Levine et al., 1942). 

4. Food-related activity-responses are more 


strongly associated with hunger than food- 
related object-responses. This hypothesis is 
based upon the assumption that drive has 
activating as well as directing properties, and 
that responses which reflect both are the best 
drive representatives. 

5. People of low ego strength demonstrate 
a stronger relationship between hunger and 
food-related responses than people of high 
ego strength. This hypothesis is based upon 
the assumption that one of the major aspects 
of ego strength is the inhibition of drive rep- 
resentatives. It is consistent with reports of 
marked individual differences in inhibiting 
thoughts about food (Sanford, 1937). 


METHOD 


Four levels of hunger were investigated by testing 
60 subjects shortly after the noon meal, 60 shortly 
before the evening meal, 30 before the evening meal 
after they had abstained from lunch, and 30 before 
the evening meal after they had abstained 
breakfast and lunch. Subjects who missed one meal 
were paid $3.00; those who missed two meals $5.00 
The remainder were unpaid volunteers. All were col- 
lege students, and in each group one-third were fe- 
male. Subjects were further screened by 
naire on what had been eaten when, 
following self-rating scale on hunger: 


from 


a question- 
and by the 


Indicate how hungry you feel at the present 
ment, by placing a check mark to the left of the 
appropriate statement: 


mo- 


—— (a) Not hungry at all (the thought of eating 
has absolutely no appeal to me at the 
moment) 

Slightly hungry (would eat 
very good, but the thought of food, in 
general, is not appealing at the moment 
Fairly hungry (the thought of food is 
somewhat appealing at the moment, and 
could enjoy something good) 

Hungry (the thought of food is appeal 
ing at the moment, and even something 
ordinary would be welcome) 

Very hungry (can’t wait to eat some 
thing ; almost anything would taste good) 


(b) something 


Finally, groups were equated on total number of 
Rorschach responses (R). The final group consisted 
of 41 subjects who had not eaten for 0-1 hours with 


and d, 22 


Seymour Epstein 


ratings of a to c on tl ibjective hunger scale 
who had not eaten f { hours with ratings of 
with 1 
eaten tor 


who had not eaten for 8 hours 


ings of d and e, and 21 who had not 


hours with ratings of d and ¢ 


Responses were scored in following categories 


1. Food Imagery—lIncludes all other categories and 


consists of all re t 


sponses \ n 
Food As 


“fried egg 


food-association value 
Names of 


2. Strong prepared 


foods, e.g., people eating or prepar- 
ing food, e.g., “two people cooking” 


3. Weak Food Associ 
“apple” ; 


Names of unprepared 
seeking food, 
eg. “a squirrel eating food-related objects 
or implements, e.g., “potato sack”; people in 
activities of questionable food relevance, e.g., “two 
people lifting a bowl’ 

4. Food-Related Object 
unprepared foods and of 
ham,” “pot” 

5. Food-Related 
ing, preparing, 
cooking” 


foods, e.z.. animais eating or 
nut’ 


“pot, 


Names of prepared and 


implements, e.g., “piece of 
Activit 


or eati 


animals seek 
pe opl 


People 

food, eg, “two 
animals 
foods in in- 


6. Instrumental Responses 
seeking or preparing food; 
edible form, e.g., “raw 

7. Goal Responses 
or animals eating 

8. Accurate Food—F epared or unprepared, 
which accurately c to the 
blot 

9. Inaccurate Food 
pared, which does not accu ly fit the 

10. Popular Food 
produced at least six 
the total sample 

11. Original 
produced no 


People or 
names ot 
; utensils 
edible 


egg 


Food in form; people 


contours of the 


epared o 


blot 


unpre 


unprepared food 
same blot area in 
Food 
mo! har ! to a 


food 


blot 


unprepared 

particular 

area 

neasured by the form 
Ainsworth, Klopfer 

that total rather 

used, as essentially what 


Finally, “ego strengtl 
level score of Klopfer Klopfe 
& Holt, 1954), with the 


than average level was 


xception 
form 
was wanted was a score of and 
pro 
more ability than 


Although total form-level 


goodness of re¢ sponse, 


it was assumed that, holding qualitv constant, 


ducing more responses indicated 
producing fewer respons: 
was directly related to R 


be directly related to number of 


ind might be expected to 
food responses, the 
did 
was no basis for con 
confounded. In 

strength, it includes 
perceptual accuracy, ability, and 
posed motivation, all of lich are 


actually inverse and not 


Thus 


two measure 


relationship was 
proach significance 


ap- 
there 
over the being 
the 


cern 
defense of measure 
integr self-im 
characteristics of 
ego strength 

Before 
scoring bias 


scoring, the d were 


In addition, the 


coded to prevent 
data on food-relevant 
responses and on ego-strength were separately repre 
sented and independently prevent 
founding of the measur from them 


scored to con 


erived 
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RESULTS 

Comparisons were made of the number of 
subjects in each group who fell above and be- 
low the median cutting point for the pooled 
group. In each category the cutting point 
turned out to be between zero and one re- 
sponse. Two-tailed chi square tests were used 
to evaluate significance. 

In Figure 1 it can be seen that self-rated 
hunger increases to 8 hours of deprivation 
and then levels off. The results are essentially 
the same when the entire pool of subjects is 
used as when the sample is restricted to 
subjects screened on subjective hunger and 
matched on R. The similarity of the subjec- 
tive hunger ratings of the 8- and 23-hour 
groups suggests that hunger may fail to in- 
crease between 8 and 23 hours of deprivation 
and raises the question of whether the two 
groups should be treated separately or com- 
bined to provide a sample as large as for the 
other groups. Accord.ngly, where indicated, 
the data were analyzed both ways. 

In regard to Hypothesis 1, which stated 
that there is an increase in food imagery up 
to a point and then a decrease, Figure 
dicates that this, in fact, did occur. The rela- 
tionship between total food imagery and time 
without food, however, is not statistically 
significant (p = .15). If weak food associa- 
tions are substituted for total food imagery, 
a falling off of responses again is indicated 
Figure 2), but the relationship now 
becomes significant (.05 level). Apparently 
strong food associations reduce the discrimi- 
nability of the total food imagery score. 
When the 8- and 23-hour groups were com- 


2 in- 


(see 


ROUP (N=180) 
‘ N=124) 


MEAN 
SELF 


RATED 
HUNGER 


hunger a 


WwW ithout food 


= - TOTAL FOOD IMAGERY 
—_—_— WEAK FOOD IMAGERY 


- — STRONG FOOD IMAGERY 


RESPONSE 


HOURS WITHOUT F 


Fic. 2. Total, weak, and strong food associations a 


a function of time without food 


level) 
and weak food imagery (.01 level) increase 
directly and significantly with increasing hun- 
ger. It may be concluded that as deprivation 
increases there is an 


bined, both total food imagery (.05 


weak and 
overall food associations, up to a point, fol- 
lowed by a levelling off, or possibly decrease, 
somewhere between 8 and 23 hours of depri- 
vation. 


increase in 


The finding of a significant relationship be- 
tween weak, but not strong, food associations 
and time without food is consistent with Hy- 
pothesis 2, which stated that responses of 
low drive-relevance are more strongly asso- 
ciated with hunger than responses of high 
drive-relevance. Despite this, the results do 
not entirely support the model, as according 
to the model strong food associations should 
fall off more rapidly than weak food associa- 
tions, whereas Figure 2 indicates that the 
reverse tended to occur. 
additional which 
level) discriminated the hunger 
the food activity score, which, 
in line with hypothesis, varied directly with 
hunger. 


The only 
cantly (.05 
groups was 


score signifi- 


In order to investigate the effects of ego 
strength, form-level scores were summed for 


all nonfood responses, and a cutting point se- 
lected to divide the group as nearly in half as 
possible. This resulted in 67 subjects in the 
low form-level group and 57 in the high form- 
level group. In Figure 3 it can be seen that 
for the low ego strength group there i 


crease in total food imagery from 


hours, whereas for the high ego strength 
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HIGH EGO-STRENGTH 
—— LOW EGO-STRENGTH 


PRODUCING 
AT LEAST 

} FOOD 
RESPONSE 


60} 


ee a a oe 
i 3 8 23 


HOURS WITHOUT FOOD 


Fic. 3. The influence of ego strength upon the rela- 
tionship of food associations to time without food 


group there is an increase followed by a de- 
crease. 

A comparison of the high and low ego 
strength groups on all levels of deprivation, 
simultaneously, fails to reveal significant dif- 
ferences. Significance (.05 level) is obtained, 
however, if the comparison is restricted to the 
23-hour groups. This latter finding must be 
treated with caution, as it could be a conse- 
quence of capitalizing on chance probabilities. 

In order to determine whether the decrease 
in food responses for the high ego strength 
group reflects a decrease in drive rather than 
an increase in response-inhibition, hunger rat- 
ings were examined in reference to 
strength. The high and low ego strength 
groups both reported an increase in hunger 
up to 8 hours and a leveling off to 23 hours. 
The groups did not differ at 1 and 4 hours, 
but at 8 and 23 hours the high ego strength 
subjects rated themselves significantly (.05 
level) less hungry than the low ego strength 
subjects (chi squares were, respectively, 4.70 
and 4.87, df= 1). The self-ratings suggest 
that the high ego strength group either in- 


ego 


hibits awareness of hunger or is actually in a 


state of reduced hunger as 
low ego strength group. 


compared to the 


DISCUSSION 


It was found that with increasing time with- 
out food there was an increase in food-related 
responses However, 
the relationship was significant only when 


food eliminated. 


followed by a decrease 


strong associations 


were 


Taken in conjunction with findings in other 
studies (Clarke & Epstein, 1957; Lazarus, 
Yousem, & Arenberg, 1953; Levine et al., 
1942; McClelland 1948; San- 
ford, 1937; Wispé & Dram- 
barean, seem safe to con- 
clude that with increasing time without food 
there is an increase followed by a leveling off 
or decrease in at least certain types of food 
response. This leveling off or decrease has 
sometimes been interpreted to indicate the 
operation of response-inhibition at higher 
drive levels. There would be a serious diffi- 
culty with such an interpretation in the pres- 
ent study as the 23-hour deprivation group 
rated themselves as no more hungry than the 
8-hour group. Moreover, subjective estimates 


& Atkinson, 
Wispé, 1954; 


1953), it would 


of hunger at 23 hours were more likely over- 
estimates than underestimates, as they were 
probably influenced by awareness of length of 
deprivation. Thus, the decrease in food. re- 
sponses at 23 hours may better be inter- 
preted as reflecting a leveling or falling off in 
hunger rather than an inhibition 
related responses, i.e., as 


of drive- 
indicating drive- 
inhibition rather than response-inhibition. 
Some evidence possibly supporting re- 
sponse-inhibition was presented by the find- 
ing that weak food associations were signifi- 
cantly associated with time without food 
while strong food associations, which are 
more susceptible to inhibition, were not. How- 
ever, according to the model the strong asso- 
ciations should 
with increasing 


rave fallen off more rapidly 
hunger than the weak food 
whereas the reverse tended to 
occur. Thus, if the basic model is to be pre- 
served, it will be necessary to give further 
consideration to the complicating effects of 
response-inhibition. 


associations, 


The findings on ego strength offer some in- 
teresting evidence for individual differences in 
inhibition. For the groups of low ego strength, 
a direct relationship was*found between food- 
related responses and deprivation through 23 
hours of deprivation. For the groups of high 
strength, a direct relation was found 
through 8 hours of deprivation, but the 23- 
hour group produced significantly fewer food- 
related responses than the 


ego 


8-hour group. The 
low and high ego strength groups both pro- 
duced negatively accelerated curves for self- 
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rated hunger as a function of deprivation. 
However, the 8-hour and 23-hour groups. of 
high ego strength rated themselves as signifi- 
cantly less hungry than the corresponding 
groups of The combined 
evidence suggests, in accordance with hy- 
pothesis, that high ego strength subjects are 
more apt to inhibit than low ego strength sub- 
jects. If the self-ratings are accepted as a 
true report of hunger, the extremely low num- 
ber of food-related responses produced by the 
high ego strength group at 23 hours of depri- 


low ego strength. 


vation supports the occurrence of response- 
inhibition, i.e., holding subjective hunger con- 
stant, high ego strength subjects are less apt 
to give drive-related responses than low ego 
strength subjects. 

Very likely inhibition of food-related im- 
agery and responses serve to reduce drive, so 
that the two types of inhibition are not un- 
related. That response-inhibition need not be 
a-conscious process was indicated by the ques- 
tionnaire at the end of the study in response 
to which almost all subjects denied suppres- 
sing food-responses. 

Apart from the model proposed, the 
ing that weak food varied 
nificantly with hunger while strong ones did 
not is of considerable interest. Coupled with 
negative findings on a difference between in- 
strumental and goal responses, it suggests that 
a dimension of response-relevance is more 
fundamental than an instrumental-goal dis- 
tinction. The reports in other studies (Atkin- 
son & McClelland, 1948: McClelland & At- 
kinson, 1948; Wispé, 1954; Wispé & Dram- 
barean, 1953) that instrumental 
provide better indices of drive than goal re- 
sponses can be explained by considering that 
instrumental responses are generally of lower 
drive-relevance More- 
over, grouping responses of food in an in- 
edible state (e.g., 
implements (e.g., “table’’) because they are 
both “instrumental” to eating 
would seem more forced than classifying them 
as of relatively low food-relevance 

In line with hypothesis, it wa 


find- 


sig- 


associations 


responses 


than goal responses. 
“wheat’’) with food-related 


presumably 


found that 


food-related activity-responses were directly 


and significantly associated with hunger while 
The 


hypothesis was based upon the consideration 


food-related object-responses were not 


that drive has both directing and activating 
aspects, and that associations that reflect both 
are better drive representatives than associa- 
tions that reflect only the directive aspect. 
No support was found for the hypothesis 
that popular and accurate food-related re- 
sponses increase more regularly with hunger 
than unusual and inaccurate food-related re- 
sponses. The hypothesis was based on the as- 
sumption that a reality-oriented inhibitory 
process is dominant at higher drive states, at 
least up to a point of breakdown. Levine et al. 
(1942) concluded that as drive increases, the 
organism becomes increasingly realistic in re- 
sponding to drive-relevant stimuli. McClel- 
land (1951) takes much the same view in 
hypothesizing that with increasing depriva- 
tion a “reality stage” follows a “wish fulfill- 
ment stage” which only under intense depri- 
vation is superceded by a “defense 
where wish fulfillment again becomes domi- 
nant. All that can be said at this point is that 
the data are inconclusive, and that the hy- 
pothesis of increasing accuracy of drive-re- 
lated responses with increasing drive up to 
some limit, although reasonable, has yet to 
be experimentally confirmed. The evidence 
provided by Levine et al. (1942) is particu- 
larly open to question, as it is based on re- 


stage” 


peated testing of five subjects without con- 
trol for practice effects, and the explanation 
was a posteriori. 

A serious limitation in the present study 
was the number and quality of food-related 
responses elicited by the Rorschach test 
When this is considered together with the 
number of comparisons made, and taken in 
conjunction with evidence that set-effects in 
laboratory studies are apt to be more impor- 
tant than drive-effects (Clarke & Epstein, 
1957; Postman & Crutchfield, 1952; Taylor, 
1956). the need for replication under varied 
conditions is clearly indicated. That factors 
other than hunger complicating the 
food-related responses was indicated by the 
bizarre nature of some of the responses, e.g., 
“Dante’s inferno, the bottom is the fiery 
tombs of the heretics, on the sides are the 
pigs of the gluttons 


were 


and at the top are the 
mournful souls who lived too early, the virtu 
ous pagans.” In laboratory investigations on 


the directive influence of drive, a major diffi 
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culty is that the magnitude of the drive is 
relatively small in comparison with other ef- 
fects. Requiring the subject to abstain from 
eating itself introduces set-effects, and in- 
forming all subjects that the study is food- 
related offers only a partial solution, as the 
information supported by abstinence has a 
different effect from the same information not 
so supported. The only solution to this prob- 
lem is either to investigate no more than 4 or 
5 hours of deprivation, or to obtain subjects 
who have not eaten for reasons unconnected 
with the study. Fortunately, there were sev- 
eral subjects who were assigned to the con- 
trol group but were eliminated because they 
had missed one or more meals. Follow-up in- 
terviews were held with seven such subjects 
who had not eaten for 8-23 hours and who 
rated themselves as “hungry” to “very hun- 
gry.” Not one of these produced a strong 
food association and five produced weak food 
associations. The results on this group are 
consistent with those on the larger experimen- 
tal group, and suggest that set-effects do not 
explain away the findings on association 
strength nor the finding of a negative ac- 
celeration of food responses as a function of 
deprivation. 


SUMMARY 


The present study was undertaken to in- 
vestigate different categories of food responses 
to ambiguous stimuli as influenced by hunger 
and ego strength. Subjects consisted of male 
and female college students divided into four 
levels of deprivation: 41 who had not eaten 
for 0-1 hours, 40 for 4-5 hours, 22 for 8 
hours, and 21 for 23 hours. Food-related 
scores and a measure of ego strength 
obtained from a Rorschach test. 

The major findings were as follows: 

1. Subjective hunger ratings as a function 
of time without food increased through 8 
hours, but did not increase further at 23 
hours. This was interpreted as suggesting 
that hunger itself may level off or decrease 
somewhere between 8 and 23 hours of depri- 
vation, and that drive-inhibition 


were 


probably 
occurs. 

2. Overall food imagery increased through 
8 hours of deprivation and decreased at 23 
hours. However, the relationship reached sta- 


Epstein 


tistical significance only when strong food as- 
sociations were eliminated. This was inter- 
preted as supporting drive-inhibition and in- 
dicating that strong food associations, since 
they are more easily inhibited, are more sus- 
ceptible than weak food associations to influ- 
ence by factors other than drive. 

3. A group of high ego strength subjects 
reported significantly less hunger at 8 and 23 
hours of deprivation, and produced signifi- 
cantly fewer food-related responses at 23 
hours of deprivation, than a group of low 
ego strength subjects. Only the high 


ego 


strength group demonstrated a decrease in 
food responses at 23 hours. It was concluded 
that ego strength is related to the inhibition 
of both drive and drive-related responses. 


4. Food-related activity-responses were sig- 
nificantly and positively related to depriva- 
tion; food-object responses were not. 

5. Significance was not found for goal and 
instrumental food responses. It was proposed 
that the goal-instrumental distinction 
be subsumed under a dimension of 
relevance of the response. 

6. There was 
become 


could 
drive- 
no that food re- 
accurate or stimulus- 
determined as deprivation increases. 


evidenc e 


sponses more 
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Recent years have witnessed an increasing 
interest in time as a psychological variable. 
There has been concern with sources of vari- 
ance in time estimation and with individual 
differences in time orientation (Wallace & 
Rabin, 1960). The present study attempts to 
relate both of these dimensions. More spe- 
cifically, this study investigated the hypothe- 
sis that the range of one’s future time per- 
spective is a significant source of variance in 
the experience of duration: the longer the 
range of one’s future time perspective, the 
faster one’s internal clock. This hypothesis is 
based on the following assumptions: that the 
more the subject desires that an interval of 
time pass rapidly, the longer it will appear to 
be (Irwin, 1961, p. 235), and that subjects 
with a long range future time perspective are 
relatively more motivated that time pass rap- 
idly. A positive correlation is thus predicted 
between future time perspective and time 
estimation scores. There are also some em- 
pirical data which suggest the hypothesized 
relationship between future time perspective 
and the experience of duration. Hindle (1951) 
and Meade (1959) found that when engaged 
in a task, the subject’s estimation of elapsed 
time is a function of the distance of the goal 
of the task. The further the goal the greater 
the subject’s estimation of the elapsed time. 
This finding is consistent with a motivational 
approach to time estimation which holds that 


1 This study was conducted while the author was 
at Bar-Ilan University, Israel. The author is grate- 
ful to A. Nir, Commissioner of Israel’s Prison Serv- 
ice; to T. Givati, Director of the Tel-Mond Prison; 
and to E. Carni of the Israel Army for their help in 
the procurement of subjects. The author is also in 
debted to Jacob Jonah for his assistance in the col 
lection of the data for this study 


the more the subject desires that an interval 
of time pass rapidly, the longer it will appear 
to be. Generalizing from this finding one can 
predict a positive correlation between future 
time perspective—defined as the relative dis- 
tance of one’s life goals—and time estimation. 
A study by Knapp and Garbutt (1958) on 
the relation between achievement motivation 
and time imagery also suggests the hypothe- 
sized relationship between future time per- 
spective and the experience of duration. In 
this study it was found that subjects with 
high achievement motivation described time 
in metaphors which reflected a very rapid in- 
ternal clock. Since there is evidence which in- 
dicates that need achievement is one of the 
sources of future time perspective (Siegman, 
in press-b), the findings of Knapp and Gar- 
butt (1958) suggest a positive correlation be- 
tween the range of subjects’ future time per- 
spective and the speed of subjects’ internal 
clock. 

The hypothesized positive correlation be- 
tween future time perspective and time esti- 
mation scores generates the additional predic- 
tion that delinquents will have shorter time 
estimation scores than nondelinquents. This 
prediction is based on the finding (Barndt & 
Johnson, 1955) that delinquents have signifi- 
cantly shorter future time perspectives than 
comparable nondelinquents 

A second objective of this study was to 
investigate the relationship between impulse 
control and future time perspective. LeShan 
(1952), in one of the first empirical investi- 
gations of future time perspective, suggests 
the hypothesis that the more thoroughly a 
child learns to delay the immediate gratifica- 
tion of his impulses for the sake of some other 
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Perspective, 


and more distant goal, the greater his future 
time perspective as an adult. Assuming that 
there is less learning of such control among 
lower-class than middle-class children, LeShan 
(1952) predicted and found significantly 
shorter future time perspectives among lower- 
children. Furthermore, assuming that 
delinquents have failed to acquire such delay 
capacity, LeShan (1952) argues that they 
should also have relatively more restricted 
future time perspectives. This prediction was 
confirmed in a_ subsequent investigation 
(Barndt & Johnson, 1955). These findings, 
however, cannot be considered as sufficient 
evidence for the hypothesis that future time 
perspective is a function of impulse control 
training. It is clear that delinquents and non- 
delinquents, and lower and middle-class chil- 
dren differ from each other in relation to 
many other variables than impulse control 
training, some of which may be responsible 
for the differences obtained in relation to fu- 
ture time perspective. Consequently, the pres- 
ent study investigated the hypothesized rela- 
tionship between future time perspective and 
impulse control training in a more direct 
fashion. Actually, this study investigated the 
relation between subjects’ future time perspec- 
tive and subjects’ present 


class 


impulse control 
level. 


SUBJECTS AND PROCEDURI 


The delinquent group consisted of 
the Tel-Mond Young 
Subjects wer ording 
range 17-19. The education of this 
with Mean 5.9 and 
The nondelinquent group consisted of 


residents at 
Offenders, Israel 
alphabeti al or 


Prison for 
selected ace 
der from the age 
group ranged from 1 to 8 years 
SD 1.28 
subjects who, in order to control for institutionaliza- 
tion, were selected from among recent inductees in 
the Israeli Army. Subjects for the nondeliquent group 
were selected so as and educational 
distribution identical to the experimental group. The 
two groups were also equated for ethnic origin, with 
77% of Middle-Eastern or North 
3% of F 
were of lowe 


to obtain an age 


African origin and 
both 


uropean origin. Subjects of 


1 groups 
socioeconomic background 

Subjects’ future time perspectives were determined 
a method similar to that used by Wallace (1956) 
Each subject was asked to enumerate 10 events that 
which may hap 
future. At the completion of this 
asked to 


would be at th 


retur to things which he may do or 
pen to him in the 
task, the subject wa 
thought he 


indicate what age he 


occurrence of each of 


the events. The median of the differences between the 
ibject’s pre 


ent age and the ages indicated by the 
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rABLE 1 


FUTURE TIME PERSPECTIVE MEAN AND STANDARD 
DevIATION SCORES IN DELINQUENT 


NONDELINQUENT Group 


Grou] 


Delinquent 
Nondelir 4 


subject for the various events was used as the sub 
ject’s future time perspective score 

In the time estimation task, each subject was pre 
sented with the following time intervals: 5, 25, and 
15 seconds. The beginning and end of each interval 
was marked by the click of a stop watch. The in 
tervals between stimuli were 5 seconds. In order to 
control for the serial position effect found by 
ous investigators (Eson & Kafka, 195 
Bindra, 1954; Siegman, in press-a), om 


previ 
Falk & 
half of each 
group was presented with the stimuli in the order in 
which they are listed, and the other half in the re 
versed order. All subjects were told that the purposé 
of this study was to determine how they experienced 
various periods of time, not to count off the seconds 
and not to make use of mnemonic devices. Two sub 
jects of the nondelinquent group did 
in the time estimation task 


not participate 
Subjects’ impulse control level was determined by 
this task, sub 
-inch circle on 
This task is a 
used by Singer, Wilen 
who asked subjects to 


means of a motor inhibition task. In 
jects were instructed to trace a 
onion skin paper as slowly as possible 
variation on a task which wa 
sky, and McCraven (195¢ 
write certain words as slowly as possible. A factor 
analytic study, and a number of other studies in 
which this task was used as a measure 
impulse control level, provide considerabl 
validity for this kind of task (Singer et al., 
Matrices (PM) scors were 


for all subjects of the contro] gre 


of subjects’ 
construct 
1956) 
Progressiv« available 
RESULTS 

Table 
obtained, as was hypothesized, lower future 
time perspective scores than the nondelin- 
quent group. Since subjects’ 


1 indicates that the delinquent group 


future time per- 
spective scores were not normally distributed 
the significance of the difference between the 
two groups was evaluated by the Mann-Whit- 
ney U test (Siegel. 1956, pp. 116-127). The 
results were: U = 130.5, p 103 

Table 2 lists the time estimations of the-de- 


] 


linquent and 


dicted 


nondelinquent groups. As pre- 
obtained — the 
Because of het 


erogeneity of variance and because 


the delinquent group 
lower time estimation scores 


the time 
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TABLE 2 


TiME EsTIMATION SCORES OF DELINQUENT AND CONTROL SUBJECT 


Time 
interval 
(seconds ) 


Delinquent group 


estimation scores in the nondelinquent group 
were not normally distributed, the signifi- 
cance of the differences between the two 
groups was evaluated by the nonparametric 
median test (Siegel, 1956, pp. 111-116). The 
chi square values in relation to the 5-, 25-, 
and 15-second time intervals were 3.00, 6.75, 
and 5.75, all of which are significant at less 
than the .02 level. 

The correlations (Kendall’s tau) between 
subjects’ future time perspective scores and 
their estimation scores of the 5-, 15-, and 25- 
second intervals were: 48 (p = .0G, 38 
(p = .02), and 48 (p= .003) in the non- 
delinquent group and .27 (p = .04), .19 (p 
= .14), and .19 (p= .14) in the delinquent 
group. In a further analysis, subjects of the 
delinquent group were divided into high and 
low future time perspective scorers, depending 
on whether they scored above or below the 
group’s median future time perspective score. 
Table 3 compares the time estimations of the 
low and high future time perspective scorers. 
As can be seen in Table 3, all the differences 
were in the predicted direction: sum of all 
differences, p< .05; and for 5-second esti- 
mate difference, p < . 


t group 


SD 


The correlations (Kendall’s tau) between 
subject’s future time perspective and impulse 
control scores were .30 (p = .02) in the de- 
linquent group and .08 (ms) in the nondelin- 
quent group. 

The correlation (Kendall’s tau) in the non- 
delinquent group between subjects’ PM and 
future time perspective scores was .15. The 
correlation (Kendall’s tau) in the same group 
between subjects’ PM and motor inhibition 
scores was 
nificant 


—.03. Neither correlation 


is sig- 


DISCUSSION 


The results of the present study are con- 
sistent with the hypothesis that the range of 
a person’s future time perspective is one of 
the variables which determines the speed of 
a person’s internal clock and, consequently, 
his experience of duration. The greater the 
range of a person’s future time perspective 
the faster his internal clock. Conversely, the 
narrower the range of a person’s future time 
perspective, the slower his internal clock. This 
hypothesis readily explains why the delin- 
quent group, which obtained significantly 
lower future time perspective scores than the 


TABLE 3 


Time EstTIMATION AND Low 


FUTURE 


TrmE Perspective (FTP) Scorers 


DELINQUENT Group 


Time 
interval 
(seconds) N VW SD 


High FTP scorers 


‘ 
g 


OO 

All estimations 9 69 
© O05 
ol 


Low FTP scorers 


V 
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nondelinquent, also obtained significantly 
shorter time estimation scores than the non 
delinquent group. 

It has been suggested (Greenacre, 1945) 
that the delinquent lives primarily in the 
present, and that his lack of concern for the 
future is one of the factors responsible for his 
criminal behavior. Although the results of 
the present study do not permit any conclu- 
sions about causes of criminal behavior, they 
suggest a basic difference between the delin- 
quents’ and nondelinquents’ experience of 
time. The results of the study by Barndt and 
Johnson (1955), as well as of the present 
study, support the assumption that the de- 
linquent has a restricted future time perspec- 
tive. Furthermore, the present study suggests 
that time passes more slowly for the delin- 
quent. This may produce a sense of boredom, 
which in turn may be responsible for an ex- 
cessive need of stimulation which may be the 
immediate cause of the delinquent behavior. 
The results of the present study suggest the 
further investigation of these speculations. 

The results of the present study are less 
clear concerning the hypothesized positive 
correlation between impulse control and fu- 
ture time perspective. Although a significant 
positive correlation between subjects’ scores 
on the motor inhibition task and their future 
time perspective scores was obtained for the 
delinquent group, the correlation between 
these two variables in the nondelinquent 
group was not significant. One possible ex- 
planation for this difference is that, although 
impulse control training is responsible for the 
development of a future time perspective, this 
very training may also produce an overly 
cautious and conforming individual. Such an 
individual may be reluctant to take the risk 
involved in anything which extends into the 
distant and necessarily uncertain future. Con- 
sequently, such an individual would be rela- 
tively less future oriented. This speculation 
assumes that some of the high motor inhibi- 
tion task scorers in the nondelinquent group 
were such overly cautious and conforming in- 
dividuals. One is not likely, however, to find 
such individuals in a group of young offend- 
ers. Another possible explanation for the dis- 
crepancy between the two groups is that there 
is no causal relation between impulse control 


473 


and future time perspective and that the sig- 
nificant correlation these two vari 
ables in the delinquent group is due to the fact 
that both variables are related to a third vari- 
able or set of variables which is associated 
with delinquent behavior. Briefly, the results 
of the present study do not permit a clear re- 
jection or acceptance of the hypothesized re- 
lationship between impulse control and future 
time perspective. ‘ 

The nonsignificant correlation between sub- 
jects’ future time perspective and intelligence 
test performance is consistent with previous 
findings (Teahan, 1958). It should be pointed 
out, however, that in a preliminary investiga- 
tion (Siegman, 1959) the author obtained a 
significant positive correlation between future 
time perspective and a test of abstraction with 
general intelligence held constant. 

The nonsignificant negative correlation ob- 
tained in the present study between subjects’ 
motor inhibition task and intelligence test 
performance is inconsistent with the signifi- 
cant positive correlations obtained by previ- 
ous investigators between indices of impulse 
control and intelligence (Levine, Glass, & 
Meltzoff, 1957; Spivack, Levine, & Springle, 
1959). Perhaps the conflicting results are due 
to the fact that the intelligence test which 
was used by the previous investigators, the 
Wechsler-Bellevue, unlike the test which was 
used in the present study, includes tasks 


betwee! 


whose successful completion requires impulse 
control. In other words, there may be no cor- 
relation between impulse control and general 
intelligence, but merely between impulse con- 
trol and certain intelligence test tasks which 


Thus, it is doubtful 
these studies can be 
considered as evidence for Rappaport’s for- 
mulation of the Freudian hypothesis (Rap- 
paport, 1951) which relates the development 
of “the apparatuses of cognition” to impulse 
control. 

The present study afforded the opportunity 
to put to an empirical test the widespread as- 
sumption that delinquents have less impulse 
control than nondelinquents. Although a num- 
ber of students of criminality have related this 
type of behavior to impulse control deficiency 
(Glueck & Glueck, 1950; Greenacre, 1945; 
Michaels & Steinberg, 1952), the hitherto 


require impulse control 
whether the results of 
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available empirical evidence is 


(Siegman, 1961). 


equivocal 


Since subjects’ motor impulse control scores 
were not normally distributed, the significance 
of the difference between the two groups was 
determined by the Mann-Whitney U test. 
The results were: U = 305, which is clearly 
not significant (p = .65). This finding sug- 
gests that delinquents are not, as is generally 
assumed, less able to control their impulses. 
The common observation that delinquents 
have a history of impulsive behavior may be 
due to the fact that they are less motivated 
to control their impulses rather than to de- 
fective control mechanisms. The fact that 
most delinquents come from a socioeconomic 
background which does not provide them with 
sufficient incentives for controlling their anti- 
social impulses (Cohen, 1955) is perhaps one 
factor responsible for this lack of motivation. 

Finally, the data of the present study make 
it possible to evaluate the relationship be- 
tween impulsivity and the experience of dura- 
tion. In a pervious investigation (Siegman, 
in press-a) a negative correlation was ob- 
tained between subjects’ scores on a motor 
impulse inhibition task and subjects’ verbal 
estimations of brief time intervals. This find- 
ing suggests that impulsive subjects have rela- 
tively more rapid internal clocks. The corre- 
lation failed, however, to reach the conven- 
tional .05 significance level. In the present 
study the correlations (Kendall’s tau) be- 
tween subjects’ scores on the motor impulse 
control task and their estimations of the 5-, 
15-, and 25-second time intervals were — .28 
(p = 08), — .29 (p= .07, and — .27 (p 

.09) in the nondelinquent group, and — .21 
(p= .10), — .22 (p= .08), and — .21 (p 
= .10) in the delinquent group. Again the 
correlations fail to reach the .05 level of sig- 
nificance. The fact, however, that the corre- 
lations vary consistently between the .05 and 
.10 level of significance is suggestive of the 
hypothesized relationship between impulsiv- 
ity and the experience of duration. 


SUMMARY 


A positive correlation was found in a group 
of young delinquents and in a comparable 
group of nondelinquents, between the range 
of subjects’ future time perspective and the 
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speed of subjects’ internal clock as measured 
by a time estimation task. 

The delinquent group obtained significantly 
lower future time perspective scores as well 
as lower time estimation scores. 

A significant positive correlation was ob- 
tained, in the delinquent group, between sub- 
jects’ future time perspective scores and their 
scores on a motor impulse control task. The 
correlation between these two variables in the 
nondelinquent group, however, was not sig- 
nificant. 

There was no significant difference between 
the delinquent and nondelinquent group in 
relation to their scores on the motor impulse 
inhibition task. 

Negative obtained _be- 
tween subjects’ time estimation and motor 
impulse inhibition scores, which were signifi- 
cant between the .05 and .10 level. 


correlations were 


Finally, the results of the present study 
also suggest that there is no significant cor- 
relation between general intelligence and fu- 
ture time perspective or impulse control. 


REFERENCES 


Time orientation 


3aRNDT, R. J., & Jonnson, D. M 
i Psychol., 1955, 51, 


in delinquents. J 
343-345. 
Conen, A. K. Delinquent 
gang. Glencoe, Ill.: Free Pre 
Eson, M. E., & Karka, J. S 
of a study in time per 
1952, 46, 169-183 
Fatk, J. L., & Brypra, D 
function of serial position and stress 
chol., 1954, 47, 279 
Giurck, S., & Grvueck, | 
linquency. New York: Commonwealth Fund, 1950. 
GREENACRE, P. Conscience in the psychopath. Amer. 
J. Orthopsychiat., 1945, 15, 495-509 
Hinpie, H. M 


tance 


The culture of the 
s, 1955 
Diagnostic implications 
eption. J. gen. Psychol., 


Judgment of time as a 


J. exp. Psy 


Unraveling juvenile de- 


Time 
travelled and relative 
Pers., 1951, 19, 485-501 
Irwin, F. W 


estimates 


as a function of dis 
larity of a goal. J 


Motivation and performance. Annu 
Rev., Psychol., 1961, 12, 217-242 
Knapp, R. H., & Garsutt, J. T. Time imagery and 


the achievement motive 
LeSuan, L. L 
abnorm. soc 


J. Pers., 1958, 26, 426-434 
Time orientation and social class. J. 
Psychol., 1952, 47, 582-589 

Levine, M., Grass, H., & Mettzorr, J. The inhibi- 
Rorschact movement re 
eliigen J. consult. Psychol., 1957, 


tion process iman 
sponses, and int 
21, 41-45 

Meape, D. R. Time affected by motiva- 
tional level, goal distance, and rate of progress. J 


exp. Psychol., 1959, 58, 275-279 


estimate as 





Pe rs pectit e. 


Micnaets, J., & Stemnserc, A. Persistent enuresis 
and delinquency. Brit. J. Delingu., 1952, 3, 1-10 
Rappaport, D. Organization and pathology of thought 
New York: Columbia Univer. Press, 1951 

S1eceL, S. Nonparametric stati for the behavioral 
ciences. New York: McGraw-Hill, 1956 

SrecMan, A. W. Arahei hahevra vehitnahagut hayahid 
{Cultural values and human behavior.| In, Baavot 
clalijot bahinuch haisraeli. Jerusalem: Ministry of 
Culture and Education, 1959. Pp. 17-37 

StecMan, A. W. Theories of juvenile « 
Some empirical investigations. Paper read at four 
teenth International Congress of Applied Psychol 
ogy, Copenhagen, August 1961 


SrecMan, A. W. Anxiety, impulse control, intelligence, 
Psychol., in 


lelinquency 


the estimation of time. J. clin 


(a) 


and 


press 


Time Estimation, and Impulse Control 


SrecMan, A. W. Some variables associated with fu 
ture-time perspective. Darshana, in press. (b) 
Sincer, J. L., Wirensky, H., & McCraven, V. G 
Delaying capacity, fantasy, and planning ability 
A factorial study of some basic ego functions. J 
consult. Psychol., 1956, 20, 375-383 
Sprvack, G., Levine, M., & Sprincie, H 
test performance and the delay function of the ego 

J. consult. Psychol., 1959, 23, 428-431 

Teanan, J. E. Future perspective, optimist 
and academic achievement. J. abnorm. soc. Ps 
chol., 1958, 57, 379-381 

Wattace, M. Future time 
phrenia. J. abnorm 
245. 

Wattiace, M., & Rapin, A 
Psychol. Bull., 1960, 57, 213 


(Received April 


Intelligence 


time 


schizo 


52, 24 


perspecti in 
P ychol., 1956, 


I Temporal experic nce 





Journal 
1961. \ 


f Consulting Ps; 
" " 


4 


GROUP PSYCHOTHERAPY, A SPECIAL ACTIVITY PROGRAM, 
AND GROUP STRUCTURE IN THE TREATMENT 
OF CHRONIC SCHIZOPHRENICS 


JAMES M. ANKER 


Veterans Administration Hospital, Perry Point, 


Activity programs and group psychotherapy 
frequently have been used in the treatment of 
the chronic patient. Evaluations of these pro- 
cedures, unfortunately, often have suffered 
from poor definition and choice of procedure, 
and inadequate design. This paper reports an 
experimental study of these therapies in com- 
bination with the effect of group homogeneity. 

The use of activity programs in neuropsy- 
chiatric hospitals received its major impetus 
from Myerson’s “total push” method in the 
treatment of chronic NP hospital patients 
(Myerson, 1939). Many authors since have 
reported varying degrees of success with this 
treatment method and progressive variations 
upon it (e.g., Cohen, 1959; Murray & Cohen, 
1959: Pace, 1957; Peters, 1955). Patients in 
such programs usually have been assigned to 
a group and given or offered a relatively spe- 
cific “job.” Responsibility for performance 
typically rests with the supervisor or thera- 
pist and slightly, if at all, with the patient 
members. Generally this approach might be 
thought of as a systematized attempt to in- 
crease drive level by stimulation and encour- 
agement. 

Scher (1957a, 1957b) extended this con- 
cept considerably with his emphasis upon the 
“task orientation” of the patient. When re- 
sponse to the task was demanded rather than 

1A review of this paper was presented at the fifth 
annual Research Conference, 
therapy Studies in Psychiatry and Research Ap- 
proaches to Mental Illness, Cincinnati, June 6-8, 
1960, and at the eleventh semiannual Veterans Ad 
ministration-Universities Conference, Washington, D 
C., December 1, 1960 

2 Formerly counseling 


Cooperative Chemo- 


psychologist, Psychology 
Service, Veterans Administration Hospital, Perry 
Point, Maryland. Now Chief Counselor and Assist 
ant Professor in Psychology, Testing and Counseling 


Center, University of Cincinnati ; 
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Varylar 


being requested Scher concluded that signifi- 
cant therapeutic changes were produced. A 
controlled study by DiGiovanni (1958) failed 
to replicate Scher’s results when the activity 
group was compared with a psychotherapy 
group and a control group, and all groups 
were compared before and after. The treat- 
ment procedures were conducted for only 4 
months, however, as compared with 12 months 
in Scher’s study. 

Members of groups occurring naturally 
adopt responsibility and/or develop ways of 
delegating it to themselves and other mem- 
bers. They participate in a general division 
of labor. Kretch and Crutchfield (1948) state 
this mutual as opposed to ex- 
ternal pressure, is what constitutes a group. 
It was presumed that this type of social or- 
ganization could occur in a chronic schizo- 
phrenic group and, to the extent that it did 
occur, behavior alteration would be possible. 
This kind of may be distinguished 
from many extant hospital activity groups by 
the locus of responsibility, the patient mem- 
bers themselves. 


cohesiveness. 


group 


evaluated in this 
study was designed to promote this “normal” 


The activity program 


type of social organization. The criteria chosen 
for it were as follows: (a) the group should 
have a definite goal or finished product which 
may be achieved in a relatively short period 
of time; (5) the goal should be periodic so 
that once the immediate goal is achieved an- 
other similar one, but one presenting new 
challenges, takes its place; (c) there must be 


a sufficient range of demand so that patients 


at all levels of adjustment may contribute 
meaningfully to the goal; (d) the activity 
must be complex enough so that it will pose 


meaningful problems to be resolved by the 


476 
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group; and (e) this activity must be of such 
a nature that patients are capable of main 
taining it with a minimum of staff interven 
tion, particularly professional staff. After 
evaluating a number of possible programs, 
the activity chosen for study was the pro- 
duction of plays for hospital patients and 
personnel. The characteristics of this activity 
are described in more detail under “Method.” 

It would be insufficient to describe the type 
of group psychotherapy studied as “ortho- 
dox.” A review of pertinent literature in the 
area reveals a rather widespread range of ap- 
proaches to group psychotherapy with schizo- 
phrenics (e.g., Bach, 1954; Grauer, 1955; 
Klapman, 1946, 1947; Kramer, 1957; Lazell, 
1945; Peyman, 1956; Powdermaker & Frank, 
1953; Schnadt, 1955). The technique used in 
this study very closely resembles that de- 
scribed by Kramer ,swith the possible excep- 
tion of less emphasis being placed upon the 
role of interpretation by the therapist. The 
atmosphere was permissive and designed to 
promote a growing sense of ‘“‘belongingness” 
by fostering a comfortable “family quality.” 
During the sessions primary emphasis was 
placed on nonpsychotic interactions between 
patients. Interaction was encouraged and im- 
plemented by the therapist but not demanded. 
The therapist patterned his orientation after 
that described by Frank (1952) by being “a 
perspicacious, strong, accepting person who 
structured the situation clearly for the pa- 
tients and supported them in their emotional 
turmoil.’”’ Any level of nonpsychotic verbal 
interaction was encouraged. This included 
topics like the difficulty in keeping personal 
belongings identified in the hospital laundry 


and the problem of saving enough money for 
passes. This technique might be contrasted to 
the didactic or pedagogical approach sug- 
gested by Lazell and Klapman. 


A number of authors (Hoffman, 1959; 
Kramer, 1957; Powdermaker & Frank, 1953), 
most writing on group psychotherapy, have 
speculated on the advantages of group homo- 
geneity or heterogeneity. While there is ac- 
tive disagreement in this area, the consensus 
favors some type of heterogeneity. Group 
structure, because of its implications for 
group treatment methods generally, was in- 
cluded as a main effect in this study. 


It was hypothesized that significant im 
provement in behavioral adjustment would 
occur as a result of group psychotherapy, the 
special activity program (drama group), and 
heterogeneous group structure. The analysis 
of these independent variables and of their 
interactions constitute the 
here. 


study reported 


DESIGN OF STUDY 


The three independent variables and their 
interactions were analyzed simultaneously by 
a 2X2 2 factorial design, each variable 
being dichotomized. The effectiveness of group 
therapy was evaluated by contrast with a 
comparable group not receiving group ther- 
apy, the effectiveness of the drama group by 
contrast with a comparable group not in the 
special activity program, and the effective- 
ness of heterogeneity by contrast with a com- 
parable homogeneous group. Because a pa- 
tient’s original level of behavioral adjustment 
could influence the degree of change in ad- 
justment the data were adjusted for this effect 
by covariance. Additionally, it was impossible 
to insure that all patients would remain in 
the study until its conclusion. Because the 
length of exposure to the treatment pro- 
cedures could effect the degree of change in 
adjustive behavior this effect was covaried as 
well. Thus the design was a 2° factorial analy- 
sis of multiple covariance. The unique com- 
binations of the three dichotomous variables 
resulted in eight distinct “treatment” groups 


PROCEDURI 


Selection of Subjects and Groups 


One-hundred-thirty-four male schizophrenic pa 
tients on a continued treatment ward of a 1,500 bed 
Veterans Administration Neuropsychiatric Hospital 
were rated on the Multidimensional Scale for Rating 
Psychiatric Patients (Lorr, 1953). A pilot study of 
interrater reliabilities produced an average reliability 
coefficient of .85 taken over 11 ward personnel. Th« 
average interrater reliability coefficient for three rater 
on the interview section was .91. Coefficients in the 
from .66 to .96. This level of 
was considered sufficient to allow rating 
by different raters to be considered as comparable 
The protocols were scored for each patient and each 
profile was compared with the hypothetical normal 
profile by means of the D statistic (Osgood & Suci, 
1952). A distribution of Ds thus was generated, one 


total matrix ranged 


reliability 


end of the distribution reflecting maximum congru 
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ence with normal behavior, the other end reflecting 
maximum divergence, or pathological behavior. This 
distribution was normalized. Subjects for the four 
homogeneous groups were chosen randomly from pa- 
tients having T scores between *1 standard devia- 
tion. The four heterogeneous groups each were com- 
prised of two patients with 7 scores of less than —1 
SD, two patients with T scores of greater than +1 
SD, and three patients with T scores in the mid- 
range. Based on evidence presented by Bales and 
Borgatta (1955) and experience in group psycho- 
therapy group size was limited to seven. Each of the 
four homogeneous and heterogeneous groups were 
assigned randomly to the treatment combinations of 
group psychotherapy and the drama group. All eight 
groups had the same average level of pathology as 
measured by the Lorr scale. There were no signifi- 
cant differences between groups regarding age, dura- 
tion of hospitalization, or the taking of ataractics. 
Median age was 38.9 years. Median duration of hos- 
pitalization was 9.2 years. Fifty-three of the 56 ex- 
perimental subjects were on ataractics. 


Measures 


The principal dependent variable, behavioral ad- 
justment, was measured by the MACC Behavioral 
Adjustment Scale (Ellsworth, undated). The MACC 
produces scores entitled Motility, Affect, Coopera- 
tion, and Communication, and a Total Adjustment 
score, the sum of the Affect, Cooperation, and Com- 
munication scores. This scale has been shown to dif- 
ferentiate significantly between open and closed ward 
continued treatment patients, to be correlated sig- 
nificantly with the Hospital Adjustment scale and 
to be associated significantly with other measures of 
improved behavioral adjustment such as the length 
of time spent on pass. The scale is short, 14 items, 
and can be rated with high reliability. Pilot study 
data on ratings by pairs of raters used in the experi- 
ment produced interrater reliabilities ranging from 
82 to .99 with an average coefficient of .92, taken 
over the 15 combinations of six rater pairs. These 
levels are consistent with reliabilities previously re 
ported for the scale 

Ancillary measures of group cohesiveness and so- 
cial choice were taken in the hope that the experi- 
mental groups would produce measurable changes in 
peripheral social behavior. The Semantic Differential 
profile given by each patient on himself was com- 
pared with the average profile he gave for other 
members of his group. A D statistic was calculated 
and interpreted as a measure of cohesion; a small D 
indicating cohesion 

Social choice data were obtained in a free choice 
situation. All patients on the experimental ward ate 
at the same time in the same area of the dining hall. 
They were seated four to a table but had complete 
freedom to choose any table in the area and any 
companions from among their fellow patients. Actual 
choice of companions at the noon meal was recorded 
for the 56 patients in the study. These choices were 
then categorized as “in group” or “out group” choices. 


Method 

Following the pilot study on reliability and the 
selection of groups, all subjects were rated on the 
MACC, were given the Semantic Differential (which 
included their name and the names of the other 
members of their group), and were observed at their 
noon meal for 3 consecutive days and their choice 
of companions recorded. Sleeping arrangements were 
changed so that group members had adjacent beds. 
Simultaneously the experimental procedures began 
The four groups in group psychotherapy were seen 
twice each week for 14 a total of 3 hours a 
week. All groups were seen by the same therapist, 
the senior author 

The four drama groups formed into two 
groups of 14, one homogeneous and one heterogene- 
ous. In each group half of the patients were in group 
psychotherapy and half were not. These groups met 
three times a week for an hour. Generally they met 
in the Recreation Hall with a staff moderator, a 
recreational therapist from Special Services. This 
moderator had been instructed to supply the groups 
with all the material and information requested or 
needed by them for the production of plays, but to 
avoid taking over the “leadership” of the group. The 
role of the moderator might best be characterized as 
a “nondirective resource person.” This role proved 
to be a difficult one to assume and was maintained 
only by frequent consultations between the experi- 
menters and the moderator. Difficulties appeared to 
stem from the identification with the 
group himself.. Thus, when a group once decided to 
put on a play reading from the scripts, the moderator 
became personally concerned over the adequacy of 
their decision. The moderator was present for all of 
the earlier meetings of the 
as time went 
prevented his 
groups met 


hours; 


were 


moderator’s 


groups but missed some 
on and occasional conflicting duties 
attendance. On occasions the 
without him. At the beginning of the 
study all patients in the drama groups were told in- 
dividually that they had been chosen for a detail to 
provide plays for the entertainment of the staff and 
fellow patients. They were “assigned,” not given a 
choice. Some patients protested leaving present de- 
tails or simply engaging in an activity for which they 
did not care. Most patients, however, accepted the 
new “detail” with characteristic indifference. When 
complaints occurred about belonging to the group 
patients were told that although their dissatisfaction 
was understandable nothing could be done. Further, 
it was pointed out that they 
this challenge but 


those 


were obliged to meet 
were free to do it in 
way they decided as a group 


whatever 


The study continued for 1 full year with measures 
taken every 6 weeks. The nurses and nursing assist- 
ants rating subjects on the MACC were not made 
aware of the specific hypotheses of the study or of 
the subgroups into which their patients fell. Two 
raters from different shifts were used for each rating 
period. Ratings were based on observations of the 
subjects for the week immediately preceding the date 
the ratings were due. It was agreed that any subject 
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leaving the hospital within 2 
ginning of the study was to be replaced by another: 
randomly selected subject. Patients who left 
longer than 2 months were counted as subjects but 
were replaced in groups by another “equivalent” pa- 
tient. No data were collected on these “replacements” 
which were used only to maintain the 
their full strength. 


months after the be- 


alter 


groups at 


RESULTS 


Because the primary dependent variable, 
the MACC, consisted of four subscores and a 
summary score, five separate analyses were 
done. In each case the analysis of the final 
scores was adjusted by multiple covariance 
for the effect of initial level and the length of 
time the subject was in the study. None of the 
main effects or interactions reached signifi- 
cance at the .05 level for the Affect subscores. 
The activity effect, however, with an F of 
3.969, narrowly missed significance at this 
level. The activity effect was significant for 
all other subscores and for the total adjust- 
ment scores: motility, » < .05; communica- 
tion, ~ < .01; cooperation, p < .01; and to- 
tal adjustment, p < .01. Group psychotherapy 
reached significance at the .05 level for the 
motility subscores, but was nonsignificant for 


the other subscores and the total adjustment 


scores. The group structure effect did not 
reach significance on any of the measures. All 
interactions were nonsignificant. 

Analysis of the Semantic Differential dis- 
tance measures between self-rating and the 
average rating of other group members re- 
vealed no difference between original and final 
measures that 
treatment or 


could be attributed to any 
combination. This 
measure produced very high attrition because 
of blank, incomplete, or obviously invalid 
protocols. It 
that all measures used decreased 
over time and that this difference was signifi- 
cant at the .02 level. 


treatment 


is interesting to note, however, 
distance 


Changes in choice of luncheon companions 
from outgroup to ingroup were practically nil. 
These social choices showed a remarkable con- 
sistency over time ahd no significant differ- 
ences were obtained, either between treatment 
groups and combinations of treatment groups 
or between original and final choices over all 
subjects. 


DISCUSSION AND CONCLUSIONS 


The most compelling result of this study is 
the consistency with which the activity group 
showed significant change on the various 
categories of the MACC Behavioral Adjust- 
ment Scale. Changes were significant on all 
but the Affect subscale where the F missed 
significance at the .05 level by a value of 
only .08. These changes uniformly were in 
the direction of improved behavioral adjust- 
ment. The significant change in the Motility 
subscale reflected a lessening of motility. The 
data suggest this was a decrease in behavioral” 
agitation and restlessness. Group psychother- 
apy showed a significant decrease in motility 
as well, but the data did not reach significance 
on the other subscales or on the Total Ad- 
justment score. No significant results were at- 
tributable to the homogeneity-heterogeneity 
variable. During the study 18 patients left on 
trial visit or discharge, 2 of which returned 
within 6 months. No group or treatment 
showed a significant difference in this regard. 

The fact that the significant differences 
found on the MACC for the activity group 
were not found in the Semantic Differential 
and social choice data for the same group re- 
inforces earlier questions about these meas- 
ures. A satisfactory method for screening in- 
valid Semantic Differential protocols was not 
found. While some protocols were obviously 
invalid, e-g., those showing an invariant re- 
sponse pattern on the test form, in many cases 
this judgment was difficult to make. When the 
validity of a protocol remained in question it 
was accepted as data and treated as valid. 
This is an arbitrary procedure at best. Relli- 
abilities on this instrument, using only the 
“valid” protocols, calculated from immediate 
test-retest by replicated items in the test form, 
ranged from —.25 to .96 with an average of 
.58. The overall change from pre- to posttest, 
if interpretable at all, most likely reflects a 
change in therapeutic procedures on the ward 
which occurred simultaneously but independ- 
ently of the study. Overall rates of leaves, 
trial visits, and discharges also increased 

Although choice of luncheon companions 
was intended to be a measure of the forma- 
tion of “real” groups resulting from the arti- 
ficial experimental groupings, it became ob- 
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vious that this behavior was extremely stable 
and insensitive to change 
which did not have a_ previously 
stereotyped pattern would have been advan- 
tageous. 


Use of a behavioral 
measure 


The results of the study are encouraging. 
While only one treatment produced significant 
changes, it did so with compelling strength 
and consistency. The activity variable was 
responsible for most of the change in behav- 
ioral adjustment that occurred. It should be 
pointed out, however, that an important dif- 
ference existed between the therapy and the 
drama groups in addition to the differences 
in “treatment.” The group psychotherapist 
was a different person from the resource per- 
son associated with the drama groups. Thus 
it is possible that the results document differ- 
ences between the skills of these two people 
rather than between treatments as such. While 
this interpretation is possible it does not seem 
most parsimonious to the authors. This prob- 
lem was evaluated when the study was de- 
signed and there appeared no feasible way of 
separating person effect from treatment effect 
and maintaining an adequate design. Addi- 
tionally, the results favor the activity effect 
a treatment wherein the resource person had 
only minimal contact. The amount and nature 
of contact with the subjects was specified as 
carefully as possible before the study began 
and every effort was made to insure they were 
maintained as such. Thus this problem does 
not affect the interpretation of the activity 
effect, which reached overall significance in 
any event. One could question the nonsignifi- 
cance of the results in most areas for the 
group psychotherapy effect, however. It is 
possible that a more skilled therapist, using 
the same procedures, might have produced 
more significant results. It was decided to 
spell out the group psychotherapy procedures 
as clearly as possible and have all groups 
seen by the same therapist to avoid confound- 
ing intertherapist differences. Because the 
drama activity had its own discrete charac- 
teristics, in addition to the criteria specified 
in the study, it is impossible to state with 
certainty the source of the significant differ- 
ences. It is clear, however, that the activity 
studied produced significant results in the 
predicted direction and there is reason to ex- 
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pect that it would do so again at another time 
or in another place. 

This latter finding taken alone should be 
of significance to those in mental hospitals 
concerned with the treatment of chronic 
schizophrenics. The activity program studied 
here produced consistent and significant be- 
havioral change with a minimum of staff in- 
tervention and expenditure of time. The “per- 
sonnel efficiency” of such a treatment method 
is unquestionably of value. Of much greater 
significance, however, is the fact that this in- 
expensive technique produces results which, in 
this study, were incomparably better than the 
more “expensive” and time consuming group 
psychotherapy highly trained 
therapist. The implications of this study for 


requiring a 


the systematic use of nonprofessional person- 
nel in the active treatment of chronic schizo- 
phrenics are compelling 
tractive. 

A number of 
selves for future study 


as well as being at- 


them- 
There is paramount 
programs by 


refinements present 
need to 
holding basic 
evaluate the 
would be expected, of 


vary activity content, 
constant, to 
criteria. It 
that any ac- 
tivity program constructed to conform to the 
basic criteria 
currently 
results. 


selection criteria 
generality of the 
course 
and administered as the one 
studied would produce equivalent 
The difficult therapist 
“effects” in group psychotherapy requires fur- 
ther attention. Although the homogeneity- 
heterogeneity variable did not produce sig- 
nificant results as it was studied, it is likely 
that the method of study could be improved. 
In this study it was advantageous to make 


problem of 


the central tendency in the two types of group 
structure equivalent, varying only the disper- 
sion. It is likely, however, that an effect due 
to group structure may interact with levels of 
pathology. Thus a study varying both effects 
systematically would provide informative data. 


SUMMARY 
Group psychotherapy, a specially designed 
activity program, and the 
heterogeneity of groups 


homogeneity or 
evaluated as 
therapeutic modalities in the treatment of 
chronic schizophreni These vari- 
ables were studied simultaneously in a 2 X 2 


were 


patients. 


2 factorial design with multiple covariance 
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THE USE OF AN EXTENDED DRAW-A-PERSON 
TEST TO IDENTIFY HOMOSEXUAL AND 
EFFEMINATE MEN 


LEIGHTON WHITAKER, Jr.” 


Wayne C 


Classical psychoanalytic theory holds that 
individuals may have a psychological identi- 
fication with the same sex or with the opposite 
sex (Fenichel, 1945). The concept of “psy- 
chosexual identity” has been used in projec- 
tive testing of personality also. Regarding the 
Draw-A-Person Test (the DAP), Machover 
(1949, p. 101) has stated: 


From the standpoint of sexual identification, it is as 
sumed to be most normal to draw the self-sex first 
From an empirical point of view, it is of interest 
that evidence of some degree of sexual inversion was 
contained in the records of all individuals who drew 
the opposite sex first in response to the instruction, 
“draw a person.” 


Presumably what is crucial in this instruction 
is that the individual is made to choose the 
sex of the person drawn and thereby projects 
his own psychosexual identity. 

The present research represents a test of 
the theoretical and practical significance of 
choice of the sex of the figure in “free choice” 
drawings on the DAP by utilizing the choice 
as a psychometric sign to predict the char- 
acteristics “homosexuality” and “effeminacy” 
in men. As Meehl and Rosen (1955) have 
pointed out, the predictive efficiencies of such 
a psychometric sign must be evaluated rela- 
tive to the base rates of the characteristics. 

1 Paper presented at the 
Arts, Sciences, 
sonality and Clinical Sec 
gan, March 1960 

2 Data for this while the 
author was at Recorder’s Court Psychopathic Clinic, 
Detroit, with the kind permission of Alan Canty, 
Executive Director. Appreciation is expressed to John 
MacBride, formerly of the Court Clinic, and Bertram 
Cohen of Lafayette Clinic, Detroit, for their assist 
ance in the beginning of the research 


Michigan Academy of 
and Letters, Psychology Division, Per- 
tion, at Ann Arbor, Michi 


research were collected 


ind end phase 
respectively. 


unty General Hospital, Michigan 


METHOD 


Two hundred and thirt ix men aged 16 to 65 


with an average age of 28, who were referred to two 
clinical psychologists in a court clinic, 
jects. Each subject was fi 
which, at the minimum 

interview and the 


served as sub- 
given an examination 
isted of a life-history 
Verbal ection of the Wechsler 
Adult Intelligence Scale. Judgments were then made 
by the same examining psychologist, on the 
of the examination, as to whether the subject was 
homosexual or effeminate. Finally, the subject was 
given the extended DAP by the No attempt 
was made to select |, with relatively few 
two psychologists 
during a period of 7 montl vere used in the re- 


} 


basis 


author 
subtle 


exe eption 


search. 

Selection of the criteria rding to which a sub 

effeminate presented 
three particular problems. First, the meaning of these 
terms varies considerably yrding to the individuals 
using them. It was necessary to these terms 
highly explicit definitions so as to allow high criterion 
reliability. Second, the i which the 
judgments wert somewhat in the kind 
and extent available. No special means was found to 
overcome this problem. Third, since not all homo- 
sexual men are effeminate according to psychoana- 
lytic theory, separate judgments of 
and effeminacy had 


ject was judged homosex 


give 


nformation on 


based varies 


homosexuality 
However, 
men do 


made it was 


assumed that most homosexual have a 
feminine psychosexual identity and it was expected 
they would project this 
personality feature into their free choice 


the DAP. 


The label “homosexual 


that as a group, therefore, 
drawings on 
meant admission by the 
subject to the examining psychologist of one or more 
instances of manifestly. sexual impulses and/or be- 
havior toward a person of the same sex when both 
parties were past pubertal age. The label “effeminate” 
meant one or more of the following: (a) display of 
effeminate speech, gestur mannerisms, or dress dur- 
ing the examination; (b) personal description by the 
ubject to the psychologist of playing a manifestly 
feminine role more years, most of 
the subject housework, sew 
ironing, infan ire, ele 


where, for or 


activity cor 
ing, washing 


description by the subject to the psy 


(c) personal 
ychologist of very 


trong interé minine activities such 
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rABLE 1 
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DISCRIMINATION OF Homo ND MEN By Draw-A-Per 
Homosexual E-tleminate 


tegular DAP 


Neg 


Test 


Extended DAP-C 


Test 
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as the above. A subject was rated homosexual or 
effeminate only if the evidence was unequivocal 

The two psychologists who examined and rated the 
subjects each had at least 3 training and/or 
experience beyond the master’s degree in clinical psy 
chology. To provide an estimate of the reliability of 


their ratings, each psychologist independently 


years 


rated 
the first 40 subjects for homosexuality and effeminacy 
There was complete agreement in rating homosexual 
ity and agreement in 38 of the 4( 
effeminacy. 

The extend d 
tered by the author as follows 


cases in rating 
DAP 
Each man was told 
“draw a person.” When the sex of the figure drawn 
was decided by the man, he was told “Now draw a 
person of the opposite sex.” Finally, 
was obtained where, the time, the 
himself the sex of the figure. The test pro 
cedure thus extended the regular DAP where the sub 
ject produces only two drawings with the sex of only 
one of these chosen by himself. It was possible, there 
compare the discriminating powers of the 
and extended versions of the test 

Since each man produced two free choice draw 
ings, i.e., drawings which could be either of male ot 
female figures, there were four possible ways in which 
a man’s DAP might be scored positive in psy 
metric sign. In the regular version of the test a posi 
tive psychometric sign is scored when the first draw 
ing is of a female figure additional 
methods of the extended ver 
the test 
positive in either fre 
drawing is Method B scores positive if the 
Method C 
female 


version of the adminis 


Was 


a third drawing 


for second man 


chose 


fore, to 


regular 


cho 


There are three 
scoring possible with 
test. Method A 


psychometri 


sion of scores a protocol 


sign if choice 
female 
second free choice is female 
tive if both free choices are 


scores posi 


RESULTS AND DISCUSSION 
As shown in Table 1, all four methods of 
scoring psychometric sign discriminated the 


characteristics effeminate, and 
homosexual and/or effeminate beyond chance 


homosexual, 


Whitaker 

levels. This aspect « lts supports the 
theoretical expectation, based upon psycho- 
analytic and projective test concepts of psy- 
chosexual identity, that psychosexual identity 
is projected into the choice of sex of the fig- 
ures drawn in free the 
DAP. 


As shown in 


choice drawings 


on 
lable when the efficiencies 
of the various signs are compared with the 
efficiencies of classifying everyone simply as 
not possessing the characteristic in question, 
it is clear that the signs have no appreciable 
value except for predicting the characteristic 
or effeminate. Even here the 
improvement is most best 84% 
ifferential weighting of 
false negatives might altet 
conclusions, dependent upon which error was 


homosexual and 
modest, at 
75° 


VS. However d 


false positives Vs 


judged worse. For example, if it were of pri- 
mary importance to screen out all men with 
the characteristics and of only secondary im- 
screening 


without the characteristics, 


portance to avoid out some men 
then the psycho- 
metric signs may be of practical use. As shown 
in Table 3 DAP with Scoring 
Method A screened & 84 and 80 of 


the men with the characteristics homosexual 


the exte nde d 


effeminate, and homosexual and /or effeminate, 


respectively. At the same time 57 75%, 


and 43% of the men without these respective 


characteristics were screened out. 
In view of the potential usefulness of the 
DAP as a 


suggested that the test be 


extended screening device, it is 


tried in other set- 


tings where cross-validation data could be ob- 


rABLE 2 


PREDICTIVE EFFICIENCIES OF 


CORRECT 


THE Ps 
DECISION 


y 


Base 
Regular DAP 

Extended DAP-A 
Extended DAP-B 
Extended DAP -¢ 


rates* 


YCHOMETRIC SIGN 
FOR INDIVIDUALS WITHIN 


THE GI 
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TABLE 3 


MEN WITH THI 


(N 


Homosexu:z 
Test 


Regular DAP 

Extended DAP-A 
Extended DAP-B 
Extended DAP-( 


tained. Perhaps the greatest obstacle to ob- 
taining such data will be the difficulty, char- 
acteristic of attempts to establish the validity 
of projective techniques, in finding adequate 
criterion measures. Some observations in the 
present research suggest that more adequate 
criterion measures would have shown the ex- 
tended DAP to be a more powerful discrimi- 
nator of psychosexual identity than the tables 
of results indicate. For example, of the nine 
homosexual men who were not positive on the 
extended DAP with Scoring Method A five 
limited their homosexual activity to playing 
the “‘masculine” role and would not, therefore, 
be expected to have a feminine psychosexual 


identity according to psychoanalytic theory 
(Fenichel, 1945). 


SUMMARY 


Two hundred and thirty-six men, referred 
to a court clinic, were rated on the charac- 
teristics homosexuality and effeminacy by a 
clinical psychologist on the basis of life-his- 
tory interviews which he conducted. Each man 
was then given an extended Draw-A-Person 
Test on which he chose the sex of two of the 
three figures drawn. All four possible meth- 
ods of scoring a test protocol as “positive” in 


CHARACTERISTICS WHO 


ARE POSITIVE IN PSYCHOMETRIC SIG) 


IGN 
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Effeminate 


psychometric sign (one or more free choice 
drawings of a female) were used to predict 
the characteristics. The results support the 
theoretical expectation, based on psychoana- 
lytic and projective test concepts of psycho- 
sexual identity, that psychosexual identity is 
projected into free choice drawings. The psy- 
chometric signs were not more efficient, over- 
all, than the base rates in predicting the char- 
acteristics. However, differential weighting of 
false positives vs. false negatives might alter 
conclusions about the practical usefulness of 
the signs, dependent upon which error was 
judged worse. It was suggested that the ex- 
tended Draw-A-Person Test be used in other 
settings where cross-validation data could be 
obtained. 
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The long-standing assumption that the K 
scale is a measure of defensiveness stems di- 
rectly from the nature of its derivation—the 
detection of hospitalized psychiatric patients 
who presented normal profiles on the MMPI. 
However, two lines of evidence have sug- 
gested that K scale scores may also relate to 
general level of adjustment. For one, Wheeler, 
Little, and Lehner (1951) found that normal 
groups scored higher on the K scale than ab- 
normal groups, and their interpretation of 
these findings was consistent with general 
clinical practice which has for a number of 
years stressed K scale elevations as an index 
of ego adequacy. The other line of evidence is 
the rather consistent finding that K scale 
scores show an increase when posttreatment 
MMPI scores are compared with pretreat- 
ment (Carp, 1950; Feldman, 1952; Gallagher, 
1953; Hales & Simon, 1948; Schofield, 1953). 

More recently, Smith (1959) has provided 
evidence which suggests the K scale is not an 
adequate measure of defensiveness for normal 
populations. He found a significant negative 
correlation (—.39) between K scores and 
judgments of defensiveness made by subjects 
with 40 hours of observation on which to base 
their ratings. Smith used the results of his 
study to argue that “it is defensive for ab- 
normal population Ss to obtain high K scale 
scores but a sign of health for normal popu- 
lation Ss” (p. 276). 

If true, the implications of a differential 
psychological meaning for K scale scores for 
abnormal and normal populations are serious. 
The original addition of a K increment to 
five of the nine MMPI clinical scales followed 
a demonstrated enhancement of discrimina- 
tion between largely inpatient psychoneurotic 
and psychotic groups and a general Minnesota 
“normal” group. However, extension of this 


weighting system to personality measurement 
within a normal population assumes that (a) 
K measures defensiveness, (b) defensiveness 
is associated with a lowering of MMPI scale 
scores, and (c) the correction by adding a K 
increment raises these scale scores and pro- 
vides a more veridical assessment of the per- 
son. However, if K is positively correlated 
with psychological adjustment for normal sub- 
jects and is not a measure of defensiveness, 
the K correction would appear to be operat- 
ing in direct opposition to test validity. That 
is, higher K scores would tend to be associ- 
ated with better adjustment for normal sub- 
jects; yet the higher the K scale score, the 
greater the K correction and the more the 
elevation of the clinical scales in the psycho- 
pathological direction. The problem of appro- 
priate K usage in normal population testing 
was foreseen by the original workers (Mc- 
Kinley, Hathaway, & Meehl, 1948) with the 
MMPI who stated: 


For cther clinical purposes it is possible that other 
\-values [ie., K weights] would be more appropri- 
ate. Thus, it seems likely that for the best separation 
of “maladjusted normals,” those which 
abound in a college counseling bureau , other 
weights might be better (p. 24 


such as 


Results such as those provided by Smith’s 
study which raise questions about the appro- 
priateness of standard MMPI usage in nor- 
mal populations warrant careful scrutiny. It 
is the purpose of the present study to evalu- 
ate two hypotheses suggested by Smith. These 
are: 

1. The K scale is a measure of psychologi 
cal health (adjustment) in a normal popula- 
tion. 

2. The K 


siveness in a normal population. 


scale is not a measure of defen- 
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Significance of 


METHOD 


Definition of Differing Adjustment Groups 
within a Normal Population 


To test the hypothesis that K scale performance is 
positively related to level of psychological health 
within a normal population, two types of groups 
from a normal college population clearly differing 
in adjustment level were selected. A group of col 
lege males (VN = 146) who had sought help at the 
University Counseling University of 
Iowa, constituted a poorer adjusted normal (PA) 
group. This number included 100 subjects who had 
requested vocational and/or educational counseling 
and 46 subjects who sought counseling for personal 
adjustment problems. The male better adjusted (BA) 
group was comprised of 153 students, none of whom 
had been seen in the Counseling Service 

Parallel female PA and BA groups were also con 
stituted. The female PA group included 143 Coun- 
seling Service clients, 100 having requested help for 
vocational and/or educational problems and the re 
maining 43 for personal adjustment problems. The 
female BA subjects (N =197) were students who 
had not requested help at the Counseling Service 
Both the male and female PA groups approximate 
representative samples of Counseling Service 


Service, State 


clients 
of ational-educational 
and personal adjustment counseling rquests are con 
cerned. 


as far as the proportion vor 


Each of the 639 subjects included in the male and 
female BA and PA groups had taken the MMPI un- 
der one of two conditions part of the uni- 
versity prefreshman entrance battery in the summer 
of 1958 or 1959, or (b) as part of the Counseling 
Service intake battery. All of the BA subjects took 
the MMPI as part of the prefreshman battery, and 
about 90% of the subjects in the PA groups did 
Thus, the K for the adjustment 
groups are provided by very homogeneous samples 
relative both age and 
of testing. Since there is 
1956) that K scores are to intellectual ability 
in college subjects, separate preliminary analyses of 
this relationship for the males and females in the 
present samples were conducted. The product-mo- 
ment correlation between K and the mean composite 
percentile on the university entrance examination for 
both males (N = 319) and for (N = 373) 
was .11, significant at the .05 Although thi 
figure suggests a relationship of limited magnitude, 
the four adjustment groups were matched for ability 
level. The group composite percentile 
male PA= 53.91, 

54.47, female BA = 

Based on Smith’s hypothesis that the K scale meas 
ures psychological health in a normal population, it 
was predicted that the BA male and female groups 
would the K than would the 
PA groups, ‘evel of adjustment being defined in terms 
of soliciting or not soliciting help for psychological 
problems subsequent to testing 


a) as 


scale scores four 


to level at time 


evidence 


educational 
some 


(Sarason, 
related 


females 


level 


means 


BA 53.7 ict ile 


were 
PA 


male 
5 4 60 


score higher on scale 


MMPI K Scale 


Definition of Defensiveness 


For purposes of his investigation, Smith used the 
definition of defensiveness provided by Page and 
Markowitz (1955) “The defensive indi 
vidual is described as who fails to ascribe to 
himself characteristics of a generally valid but 
cially unacceptable (p. 431). In the present 
study the concept of defensiveness was extended so 
include the self-ascription of characteristics 
not valid but socially acceptable as 
well as the denial of valid but unacceptable charac- 
teristics. 


as follows: 
one 
so- 
nature” 


as to 


which are are 


Since adjustive behaviors are much more likely to 
socially acceptable than nonadjustive behaviors, 
one difficult aspect deriving a useful measure of 
defensiveness in a normal population is the increased 
probability that a socially acceptable self-description 
is also factually correct. In the present study this 
problem of confounding defensiveness and accurate 
self-description was approached by using the self 
descriptions of a group of subjects at the malad- 
justed end of a normal population adjustment range 
Self-descriptions on the 300-item Adjective Check List 
(ACL) (Gough, 1960) 
who sought help for personal 
at 


be 


of 


students 
adjustment problems 
the Counseling Service were scored for the num 


of 50 male college 


included in 
the 75 judged to reflect most favorably on the en 
dorser and the number included in the 75 judged to 
reflect least favorably on the endorser (see Gough, 
1955). Subtracting the latter 
provided a 
By scores at 
the median, a group of subjects giving more favor- 


ber of endorsed adjectives which were 


count from the former 


“favorability” each subject 


cutting the distribution of 


count tor 


favorabilits 


able self-descriptions and a group giving less favor- 
able self-descriptions were obtained all sub 
jects were maladjusted it was assumed that the 
subjects giving more favorable self-descriptions rep- 
resented a more defensive group. A “Defensiveness’ 
(Def) scale was then developed for the ACL by de- 
termining through chi square analysis which adjec- 
tives out of the 306 discriminated be- 
tween the high and low favorability subjects. The 
61 adjectives which discriminated cross-vali 
dated on a new sample of 34 male personal adjust 
ment counseling subjects, and 28 of these adjectives 
significantly differentiated the newly constituted high 
and low favorability 


(composed of 27 


Since 


total reliably 
Vv i 


were 


groups. These 28 adjectives? 
favorabl 
adjective which 
included in 
noted that the of 
Def scale well as the fe 
low) parallels that of the K 


adjectives and one un- 
favorable substractive 
scale. It « 
the male 


desc ibed be 


became a 
Def 
derivation 


item) were the male an 


be mode for 


(as male scale 
scale in the inclusion of 
items which discriminated between maladjusted sub- 


jects who portrayed themselves psychometrically in 


1 The 
female cal 
out Alfred B 
ment of Psychology 
City, 


lists 
Defensiveness 


of adjectives included in the male and 
may be obtained 
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State Unive ot 


with 
Depart 
Iowa 


charge from 


rsity Iowa 
Iowa 





Alfred B. Heilbrun, Jr. 


TABLE 1 


MEANS AND SDs oF MALE 


OBTAINED UNDER STANDARD, 


Male 
Testing condition mean 
Standard 
Defensive 
Ideal-self 


15.30 
18.63 
21.00 


an unduely favorable light and those subjects who 
did not. 

It was found that 
male students the Def 
related with total number 
(r= .71), so a correction necessary. The mean 
number of endorsed adjectives for these subjects 
was 95.16 with an SD = 30.08, while the SD for 
the Def scale was 3.28. A correction of one point 
(about one-third SD) on the subjects Def scale 
score was made for each 10 endorsed adjective (about 
one-third SD) deviations of total checked 
group mean of 95—added if the total number was 
below this and subtracted if the total fell 
above 95. Six to nine adjectives were counted as a 
point. This correction successfully removed the con- 
tingency of Def scores upon total adjective endorse 


for a sample of 97 normal 
positively cor- 


adjectives checked 


score was 
of 


was 


from the 


figure 


ment as attested by the insignificant correlation of 
r= .08 between these variables for a new group of 
male college students (V= 109). The reliability of 
the male corrected Def scale scores is .67, based on 
a 10 week test-retest sample of 43 college subjects. 
A defensiveness measure for the ACL was derived 
for female college subjects by the same procedures 
as for the males. The original group of maladjusted 
Counseling Service subjects consisted of 4 
while the replication sample included 55 
Seventy-two 


females 
subjects 
adjectives reliably discriminated be 
tween high and low favorability groups in the origi- 
nal sample, while 36 items held up on cross-valida- 
tion and were included in the female Def scale. Of 
this number, 28 were 
were unfavorable, thus 
cumulative 


favorable adjectives and 8 
being subtracted from the 
There were 10 overlapping items 
between the male and female scales 

(r 66) between 
female Def scale scores and number of adjectives 
checked was found for a sample of 114 college: fe- 
males. Correction was made based on the follow 
ing statistics: mean number of adjectives checked 
= 92.63; SD of total adjectives checked = 30.75; and 
SD of Def scale = 6.97. It was found that using the 
adding or subtracting of two Def scale points (about 
one-third SD) for each 10 adjectves checked (about 
one-third SD) above 
partial credits for 
produced a 


score 


A similar positive correlation 


or below 9 
part 10) overcorrected and 
negative correlation of — .64 between 
Def scores and total adjectives. The same correction 
used with the male scale was then applied and quite 


adjectives (with 


ot 


AND FEMALE CORRECTED DEFENSIVENESS AI S 
DEFENSIVE, 


CORED ON ACLs 


AND IDEAL-SELF YITIONS 


successfully eliminated relationship (r 03) 
between Def scores and total adjectives checked, 
based on the performances of the 103 additional fe 
The 10 week 


for females 


male college subjects test-retest reli 
ability of the Def scale (N = 56) is .79 

Some preliminary evidence was available to evalu 
ate the male and female Def scales as measures of 
defensiveness. These scales were scored on ACLs ad- 
ministered to normal ge subjects under three 
conditions: (a) a standard instruction research con- 
dition; (b) a standard ir 
inducing condition 


colle 


defensiveness- 
and (c) 
n condition 


struction 
(Heilbrut 
ideal-self-description instructio 
1958). If Def 
mght be 
crease over these 


1958) ; an 
(Heilbrun, 


scales art ures of defensiveness, 


scores expected to 
three cor 


expected progre ssive meal 


show a progressive in 
Table 1 
well as an in 
Def Tests 
rence in standard and 
males (t 3.15 for 
127 df; 
Various considera 
heterogeneous vari 


ditions 


shows this 
ease as 
creasing homogeneity of ores 10% scores 
of significance showed the diffe 
defensive condition 
123 df; p< .01) and females (t 
p < .001) to be highly significant 
tions (e.g., overlapping subjects, 
ances) made 
defensive and 


means for 


5.50 tor 


testing between 
unfeasible. Also, 


may be reflecting 


differences 


for mean 
ideal-self c« 

to evaluate whether the Def scales 
differences in adjustment level 


ynditions 


rather than defensive 
the Def scale mean tor 
adjusted Counseling Ser\ 
N= 109) and 


were compal ed 


ness, groups of more poorly 
SD 525? 
females SD 6.05; N 103) 

to the means of the normal 
(i.e., “standard condition”) male (15.30) and female 
(16.85) Table 1. The almost 
Def scores not 


ice males (15.26 


(1¢ 


groups reported in 


identical means indicate that 


are 
measures of adjustment level 

Despite the evidence 
scales do measure defensiveness 
normal college population 
these scales confound 
and true adjustment level (ie., 
are truly well adjusted 
truly maladjusted). TI 
that proportionally mor 
ince the Def scale 
than would be 
ACL, the proport 
unspecifiable 

To test 


a measure 


preliminary that the Def 
in self-description for 
clear that 
defensiveness 
high 


subjects, it is 


scores on still 


some scorers 
ome are 
that is contended is 
the performance vari- 
ittributed 


low scorers 


on to defen 


ivene for performance on 
the entire 


in either case being 


Smith’s hypothesis that 
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a normal population, 


scale is not 
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Significance of 


two correlational analyses were conducted: (a) the 
K scores of 103 Counseling Service female and 109 
Counseling Service male college students were cor 
related with their Def Scores on the ACL, and (5) 
the K scores of 141 


normal college 


normal college female and 92 
male subjects were correlated with 
their ACL Def scores. ACLs for all subjects in the 
Counseling Service groups were administered as part 
of the Counseling Service intake battery at a vari- 
able time (from a few moments to almost 
following the administration of the MMPI 
mal subjects were given the ACL 
conditions from 1 to 18 months 


MMPI. 


2 years) 
The nor- 
research 
taking the 


under 
alter 


RESULTS AND DISCUSSION 


Hypothesis I: The K scale is a measure of 
psychological health in a normal population 


The mean K standard score for the male 
PA group was 55.62 (SD 
to a mean K 

8.35) 


8.55) compared 
standard score of 54.18 (SD 
for the BA male subjects. These 
mean values do not differ significantly from 
each other (¢ = 1.43 for 299 df; 15 <p 

.20). The female PA group had a mean K 
(SD 
BA subjects had a mean K 
(SD = 7.02). Again the difference in means 
did not differ reliably from zero (¢ = 1.51 for 
338 df; 10 < p< .15). Thus, there is no sup- 
port in these data for the contention that the 
K scale measures degree of psychological ad- 
justment in a normal population. However, 
since each of the PA groups was composed of 
two subsets of subjects differing in level of 
adjustment (i.e., better adjusted vocational- 
educational cases vs. adjusted 
personal adjustment remained a 
possibility that differences in K might be 


score of 56.83 7.25) whereas female 


score of 58.02 


more poorly 
cases), if 


demonstrated if more extreme comparisons 
were made. Accordingly, the mean K scores 
for the personal adjustment counseling sub- 
jects and the BA subjects were compared. For 
males, the mean K score for the personal ad- 


justment subjects (V = 46) was 54.24 (SD 

8.62), whereas this mean score for the BA 
subjects (V = 153) 
54.18 (SD 
score for the personal adjustment subjects (V 

43) was (SD and the K 
scale mean for the BA subjects (NV = 197) 
was 58.02 (SD = 7.02). This difference was 
significant at the .05 level of confidence (¢ 

2.36 for 238 df). Thus, there is some evi- 


was an almost identical 


39.23 6.35) 


MMPI XK Scale 


8.35). For females, the mean_K _ 
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dence that K is positively related to level of 
psychological adjustment for females when 
extreme groups are compared but no evidence 
for this relationship with males. 

Since all PA subjects were Counseling Serv- 
ice clients who had solicited help, one pos- 
sible bias in these K by adjustment level 
analyses is that such “help-seekers” would 
also tend to be uncritically frank endorsers of 
pathology (i.e., “plus getters’) on the MMPI. 
Since plus-getting should be associated with 
lowered K scores, such a bias would operate 
in the direction of supporting the hypothesis 
that the more poorly adjusted subjects in this 
study would have lower K scores than nor- 
mal subjects who had not sought psychologi- 
cal help. There does not appear to be any way 
to analyze the possible effect of plus getting 
within the current data, although it can be 
noted that Hypothesis I failed to receive any 
support in the male analysis despite this pos- 
sible bias effect. It might be added that the 
lowered circumspection implicit in plus get- 
ting behavior can actually be considered a 
part of the true pathology of subjects, repre- 
senting as it does 
ego defenses. 


a marked lowering of the 


Hypothesis II: The K scale is not a measure 
of defensiveness in a normal population 


The product-moment correlation between K 
standard scores and scores on the Def scale 
was .43 for the 109 male Counseling Service 
subjects. Considering the only moderate test- 
retest reliabilities of the two scales? and the 
fact that considerable time typically elapsed 
between the two tests, a correction for at- 
tenuation was applied and this correlation be- 
came .64. Both correlations are significant be- 
yond the .01 level of confidence. For the 103 
Counseling Service female subjects, the cor- 
relation between K and Def was .26. After 
correction for attenuation this correlation was 
.35, both figures being significant beyond the 
.0O1 level of confidence. The correlation be- 
tween K and Def scale scores for normal col- 
lege males (V = 92) was .24 (p < .05) or .35 
(p < .01) corrected for attenuation. This cor- 


2 A test-retest reliability figure of .70 was used for 
the K scale in all corrections for attenuation. This 
figure has been suggested as a best estimate (Dahl- 
strom & Welsh, 1960, p. 53). 
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relation was — .25 (p < .01) for normal col- 
lege females (V = 141) or — .36 (p< .01) 
following correction. 

These sets of correlations suggest two 
things: For one, the K scale appears to be 
a better measure of defensiveness for malad- 
justed subjects in a normal population than 
for the better adjusted subjects in such a 
population. The decrease in the positive rela- 
tionship between the K scale and the meas- 
ures of defensiveness comparing the correla- 
tion for the male maladjusted group to that 
for the male adjusted group (.29) and the 
correlation for the female maladjusted group 
to that for the female adjusted group (.71) 
were both significant (p < .05 and .001, re- 
spectively). This finding is generally consist- 
ent with Smith’s (1959) argument that “‘it is 
defensive for abnormal population Ss to ob- 
tain high K scale scores but a sign of health 
for normal population Ss” (p. 276), if the 
reasoning is extended to maladjusted vs. ad- 
justed subjects with a normal population. The 
second implication of these correlational data 
is that a sex difference in the psychological 
meaning of K scale performance may exist. 
In the analyses of maladjusted normal groups, 
the females provided a significantly positive 
correlation between K scale performance and 
a measure of defensiveness but one that was 
reliably lower (p < .05) than that for the 
males. When adjusted normal groups were 
analyzed, males continued to show a signifi- 
cant positive relationship between their K 
scale scores and Def scale scores, whereas fe- 
males showed the opposite pattern—the higher 
the K scale score tended to be, the lower the 
defensiveness. The reversal for adjusted col- 
lege females of the usual psychological sig- 
nificance attributed to the K scale was also 
found by Smith in his predominantly male 
group of industrial supervisors. This reversal, 
taken in conjunction with the finding that 
adjusted females showed significantly higher 
K scores than more seriously maladjusted fe- 
males, suggests that both of Smith’s hypothe- 
ses received considerably more support from 
the female data than from the male. 

In conclusion, the data from the present 
study indicate that the K scale is positively 
related to defensiveness when more malad- 
justed subjects from a normal college popula- 
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tion are appraised but is less positively re- 
lated or even negatively related to defensive- 
ness when psychologically healthy subjects 
are considered. The alternative psychological 
implication of K for normal population sub- 
jects suggested by Smith—degree of psycho- 
logical health—received some support in the 
case of females but none in the case of males. 
These results suggest that the standard K cor- 
rection for the MMPI clinical scales is psy- 
chometrically advantageous in normal college 
population testing with male maladjusted sub- 
jects, somewhat less useful with maladjusted 
females and better adjusted males, and a 
source of invalidity with better adjusted fe- 
males. 


SUMMARY 
Two hypotheses taken directly from a 
study by Smith (1959) and indirectly from 
earlier investigators were tested in the pres- 
ent study: (a) the A scale of the MMPI is 
a measure of psychological health in a normal 
population, and (4) the K scale is not a 
measure of defensiveness in a normal popula- 
tion. To test Hypothesis I, the K scores of 
two samples of maladjusted male (N = 146) 
and female (.V 143) Counseling Service 
clients were compared with the K scores of 
male (V = 153) and female (N = 197) col- 
lege normals. No signicant differences were 
found in either comparison, although the nor- 
mal female group mean K score was reli- 
ably higher than that of a subgroup of the 
most seriously maladjusted females (V = 43). 
Thus some support for the hy- 
pothesis that K is a measure of psychological 
health in a normal population in the case of 
females only. 

Hypothesis II was tested by correlating K 
scale scores with specially constructed male 
and female Defensiveness scales for the Ad- 
jective Check List. Significant correlations of 
.64 for male Counseling Service subjects (V 

109) and .35 for female Counseling Service 
subjects (.\ 103) support the assumption 
that the K scale is a measure of defensiveness 


there was 


for maladjusted subjects in a normal popula- 
tion. However, when these relationships were 


determined for normal male (N and 
female (N 141) college subjects, reliably 


different correlations between K and the de- 


92) 





Signi fi ance of 


fensiveness measures were obtained (.35 and 

36, respectively). These correlational data 
tended to support Smith’s contention that the 
K scale is a better measure of defensiveness 
among more maladjusted subjects. 
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SELF-SATISFACTION AND PSYCHOLOGICAL ADJUSTMENT 
IN SCHIZOPHRENICS 


DENNIS K. KAMANO! 
Galesburg State Research Hospital, Illinoi 


It is generally recognized that the satisfac- 
tion or concern of an individual with his phe- 
nomenal self represents an important aspect 
of psychological adjustment. For example, it 
has been demonstrated that marked dissatis- 
faction with one’s self is indicative of con- 
flicts or maladjustment (Cowen, Heiliger, & 
Axelrod, 1955), while positive and self-accept- 
ing attitudes towards the self are associated 
with good psychological adjustment (Mc- 
Quitty, 1950; Rogers, 1950). Most of these 
studies, however, were confined primarily to 
normal and psychoneurotic subjects, but the 
relationship between self-satisfaction and psy- 
chological adjustment in regard to other 
classes of individuals is not clear. It follows 
that a particular formulation found to be ap- 
plicable to one class of subjects may not be 
applicable, in the same way, to other classes 
of individuals. Such paradox in conception 
may be seen in the widely accepted view that 
manifest anxiety is patently disruptive and 
maladaptive when seen in a nonhospitalized 
individual but is a prognostically good sign 
when seen in a hospitalized patient (Arieti, 
1955). Similarly, it is granted that to admit 
satisfaction with one’s self is indicative of 
good adjustment in a normal individual, but 
is it a prognostically good sign when seen in 
hospitalized schizophrenic patients? The rela- 
tionship between self-satisfaction and psycho- 
logical adjustment is a complex one, and there 
is a need for further study and a qualified 
interpretation. 

It has been widely recognized by clinicians 
that schizophrenic patients differ markedly in 
their degree of expressed self-satisfaction and 
adaptive potential. In contrast to normal sub- 
jects, it has been widely recognized by cli- 

1 The author wishes to express his appreciation to 
Janet E. Drew and Vasso Vassiliou for their assist- 
ance in the collection of the data. 


nicians that with hospitalized schizophrenic 
patients at least, concern with one’s self rep- 
resents a more adaptive behavior than self- 
satisfaction when this is based upon sup- 
pressive and repressive mechanisms. Some 
patients reveal extreme self-satisfaction, a 
frequently observed behavior used by pa- 
tients to deny to themselves the extent of 
their discontent and pathology. Such patients 
are likely to be unrealistic, inflexible, and 
resistant towards any forces threatening to 
disrupt such rigid self-definition to such an 
extent that adaptive potential is grossly re- 
duced. Much depends, of course, on the con- 
cept of adjustment one subscribes to. Most 
psychologists would agree in considering a 
suppressive, repressive mode of adaptation in 
hospitalized schizophrenics as less than ade- 
quate. Such behavior may represent a condi- 
tion sufficient enough for a stable and benign 
hospital environment where pressure on the 
patient never becomes too great, but one 
which is incapable of manifesting adaptive 
flexibility in other situations. In a sense, such 
a person is adapted, as far as hospital ad- 
justment is concerned, but not adaptable. 

The above considerations led to the formu- 
lation of the following hypotheses with which 
the present study is principally concerned: 

1. Schizophrenic subjects revealing extreme 
self-satisfaction will tend to deny and sup- 
press threatening features of themselves to 
such an extent that this will be reflected in 
their response to a personality evaluation. 
That is, schizophrenic subjects revealing ex- 
treme self-satisfaction will reveal lower re- 
call of unfavorable personality characteristics 


from a passage designed to simulate a per- 
sonality evaluation, as compared with schizo- 
phrenic subjects not so characterized. Implicit 
in this proposition is the corollary that ex- 


tremely self-satisfied schizophrenic subjects 
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Self-Satis faction 


will tend to reveal higher recall of favorable 
items consistent with their highly favorable 
self-concept. 

2. Schizophrenic subjects revealing extreme 
self-satisfaction based upon denial and re- 
pressive mechanisms, represent a state of un- 
realistic self-appraisal and general reduction 
in their capacity for evaluation, and such re- 
duction in evaluative ability will be reflected 
in a situation requiring realistic appraisal of 
their performance from an objective frame of 
reference. That is, schizophrenic subjects re- 
vealing extreme self-satisfaction will reveal 
greater discrepancy between their level of 
performance and level of aspiration than 


schizophrenic subjects not so characterized. 
Schizophrenic subjects admitting some dis- 
satisfaction with themselves will tend to set 
their level of aspiration more in relation to 
their actual level of performance and reveal 
lower discrepancy scores than extremely self- 
satisfied schizophrenic subjects 


METHOD 


Measure of Self-Satisfaction 


There are ways in which self-regarding 
attitudes can measured. One method is to use 
the semantic differential technique and to index the 
evaluation of the self-concept along the scale pro- 
vided by the subject’s own judgments of the con- 
cepts, my Actual Self (AS), my Ideal-Self (IS), 
and my Least-Liked Self (LLS), eg., the distance 
from AS to LLS as a ratio to the total distance 
from LLS to IS (Osgood, Suci, & Tannenbaum, 1957) 
This ratio, LLS-AS/LLS-IS, was used in this study 
as an index of self-satisfaction. The ratio, LLS-AS/ 
LLS-IS, approaches 1.00 as the location of AS ap 
proaches that of IS, ie., as one’s self-satisfaction in- 
In other words, the value in 
with self-satisfaction 

These self-concepts were rated on 15 bipolar scales 
which were presumed to be relevant. The scales used 
included 6 representative of the evaluative factor (at 
tracting-repelling, complete-incomplete, important- 
unimportant, healthy-sick, high-low, and sociable 
unsociable), 5 for the potency factor (large-small, 
hard-soft, strong-weak, deep-shallow, and masculine- 
feminine), and 4 for the activity factor (active-pas- 

hot-cold, tense-relaxed, 


several 
be 


creases increases size 


and aggressive-defen- 


Subject 

Forty-four institutionalized white women labeled 
is schizophrenics were subjects in the present study 
The subjects were screened to determine that they 
were ufficiently in contact and of adequate op- 
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erating intelligence to understand anu complete the 
tasks involved. Self-satisfaction scores for the 44 
subjects were secured from the ratio, LLS-AS/LLS- 
IS, by averaging the ratio for each. .: the 15 scales 
There were 17 subjects who received a score of 1.0 
or more, while 27 subjects received a score of less 
than 1.00. The high self-satisfaction (HS) group was 
composed of the 17 subjects who received a score of 
1.00 or more, while the low self-satisfaction (LS) 
group was composed of the 27 
a score of less than 1.00 

group was 30.40 with a range of 
mean age for the LS group was 27.30 with a range 
of 18 Both groups were matched on their im- 
mediate recall of a control passage, and the mean 
score for the HS group was 5.22 and for the LS 
group 5.12, a nonsignificant difference. Both groups 
were composed of chronic undifferentiated schizo 
phrenics with only two chronic paranoid schizo- 
phrenics in the HS group and eight in the LS group 


subjects who received 
The mean age for the HS 
19-38, while the 


38 


Procedure 


It is relevant to this experiment to note that two 
female assistants served in the various phases of the 
study. The recall phase of the experiment was con- 
ducted by one assistant, while the level of aspiration 
phase was conducted by the other assistant. Two 
different assistants were used in an effort to main 
tain some degree of independence between the dif- 
ferent phases of the experiment proper 

Each subject was examined individually. After a 
brief general discussion, the subject was presented a 
control passage secured from the Wechsler Memory 
scale, Form 1 (1945, p. 6), to match the subjects on 
their immediate recall. Following the presentation of 
the passage, each subject rated the concepts AS, IS, 
and LLS on the semantic scale presented in counter- 
balanced order. Two matched groups of 17 HS sub 
jects and 27 LS subjects, respectively, were secured 
for the experiment proper. 

Recall series. This session was conducted 2-5 days 
after the initial phase of the study. For all subjects 
the following instruction was given: 


Remember the ratings of yourself that you com 
pleted the last time? Well, as you know, they do 
reveal a lot of things about you. I have here an 
evaluation on what was revealed about you from 
the tests that you took. Of course, this will be 
strictly confidential. Listen very carefully while I 
read it to you because I want you to tell me as 
much about it as you can 


the reading of passage, the subject 
was instructed to repeat as much of it as she could 
and was assured that it did to be in the 
exact words. 

The experimenter read aloud the passages printed 
on a card. The experimenter practiced the reading 
prior to the experiment and found little difficulty in 
preserving uniformity from reading to reading. 

The experimental passa 
reproduced 


Following the 


not have 


ge. The experimental pas- 


sage below has been subdivided 


into 
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items for scoring purposes. The passage was divided 
so that each item would include one idea, but that 
the recall of one item would not automatically in- 
clude the recall of another. 


You are an intelligent person/but you are un- 
able to solve your problems./You are satisfied to 
do only enough to get by,/although you have the 
ability to do more./You do not always see things 
clearly,/even though you are capable of handling 
situations normally./You have good general knowl- 
edge,/and can assume responsibility./However, be- 
cause you feel insecure,/you are afraid to try new 
things./You are able to get along with people,/ 
but you are too easily offended./You could be self- 
sufficient,/but you prefer to be dependent upon 
others./ 


In order to note differences in the recall of types 
of items, each item was rated as “favorable” or “un- 
favorable.” The ratings were made by the author and 
two other independent raters. The percentage agree- 
ment between the three independent raters was 
84.30%. In the case of discrepancies, the final rating 
was decided upon after joint conference. 

There were a total of 14 items, 7 
unfavorable. 


favorable and 7 


Scoring. Each item recalled was assigned a weight 
of unity. An item was scored as a recall if repro- 
duced accurately or if the idea itself was repro- 
duced accurately. Inaccurate reproductions of items 
were not scored 

Level of aspiration series. Five days after the 
recall series, the Digit Symbol test secured from 
the Wechsler-Bellevue Intelligence Scale (Wechsler, 
1944) was administered to each subject individually 
with the standard instruction provided by the manual, 
together with a 1 minute time limit. After the first 
performance, the following instruction was given 
“You made a score of ——_— on the test in 1 minute 
Let’s try it again. Here is another test which is done 
in the same way, but which uses different marks.” 
An equivalent form of the Wechsler-Bellevue Digit 
Symbol test was used. After the subject completed 
the samples, the experimenter said: “How many of 
these do you think you will be able to do in 1 
minute?” After recording the estimate, the subject 
was allowed to perform on the test 
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Scoring. The deviation of estimate from perform 
ance was designated as the “D score.” In each case 
the D score was the difference between the perform- 
ance or actual score made and the estimate follow 
ing it. When a subject estimated higher than the 
score she had earned, her D score was designated as 
positive. Whenever a subject estimated lower than 
the previous performance, her D score was negative 
The D score provides not only a measure of the 
subject’s aspiration but also of her adjustment to 
the reality of her own performance. A low D score 
implies somewhat better contact goal and 
accomplishment. 


between 


A second D score utilized the difference be- 
tween the estimate just made and the performanc 
following it. This measure reflects the level of suc- 
and failure in the subject's goal 
setting 

Because of apparent skewness in the D scores, the 
results were by the nonparametric Mann 


analyzed b he 
Whitney U test 


was 


cess relation to 


RESULTS AND DISCUSSION 

Recall Series 

In Table 1 the HS and LS groups are com- 
pared on the number of items recalled from 
the passage simulating a personality evalua- 
tion. Our data indicate that the HS group 
recalled significantly less items than the LS 
group, both in total recall and in the recall 
of items reflecting unfavorable personality 
characteristics. There was no significant dif- 
ference in the recall of favorable items be- 
tween the two groups. However, for the HS 
group alone, more favorable items were re- 
called than unfavorable items. It appears 
that, as predicted, the schizophrenic subjects 
with extremely high self-regarding attitudes 
tended to deny and suppress unfavorable per- 
sonality features of themselves to such an ex- 
tent that this was reflected in their perform- 
ance. Since schizophrenic subjects with ex- 


TABLE 1 


COMPARISON OF 


HS recalls 
N = 17 


Mean 


1.35 
Unfavorable 94 


Favorable 


rotal 2.29 


* Significant at the .05 level, one-tailed test 


MEAN 


RECALL SCORES 


LS recalls 
N = 27 





Self-Satisfaction 


self-satisfaction tend to resist self- 
evaluation or any external promptings that 
may disrupt such self-definition they have 
adopted, the threat of the test situation un- 
doubtedly contributed not only to the reduc- 
tion in recall of unfavorable items but in the 
recall of favorable items as well. It is possible 
that, once developed such self-definition re- 
sist change and represents a prognostically 
poor sign for therapy. The question of ther- 
apy with such subjects invites further study. 


treme 


Le vel of Aspiration Series 


To begin with, the HS and LS groups were 
compared on their initial performance on the 
Digit Symbol test. The HS group obtained a 
mean score of 25.30 and the LS group a mean 
score of 26.00, a nonsignificant difference. In 
this respect, the two groups were matched in 
their performance on the Digit Symbol test, 
and this lessened the possibility of the effects 
of differential ability on the results of this 
phase of the study. 

The discrepancy between 
(D disregarded 
positive or negative direction of esti- 


performance and 


estimate score), with signs 


(i.e 


mates), provided a measure of each subject’s 


adjustment to the reality of her own perform- 
ance. The results of this analysis are given in 
Table 2. 
ancies between Trial 1! 
significantly higher D scores for the HS group 
as compared with the LS group. There were 
significantly 


D scores secured from the discrep- 


and estimate revealed 


greater between 


the actual scores made on the first trial and 


disc repan ies 
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rABLE 2 

MEAN DISCREPANCIES AND REs 
MANN-WHITNEY U TEsT TO 
ANCY SCORES BETWEEN THE 


ULTS OF APPLYING 
ARRAYS OF DiIscr 
HS anp LS Grovups 


HS group LS 


Mear 


group 
LD) score Mear 
rrial 1 and estimate 
Estimate and Trial 2 
Trial 1 and Trial 2 


3.60 


OS leve 
01 leve 


* Significant at the 
* Significant at the 


the estimates following it for the HS group 
as compared with the LS group. 

D scores secured from the discrepancies 
between the estimates and subsequent per- 
formances (Trial 2) also revealed significantly 
higher D scores for the HS group as com 
pared with the LS group. The discrepancies 
between the estimates made and the perform- 
ances following it were significantly greater 
for the HS group than for the LS group 
Analysis of the differences between Trial | 
and Trial 2 on the Digit Symbol test yielded 
no significant differences between the two 
groups. 

Table 3 presents an indication of the per 
centages of subjects in the HS and LS group: 
in terms of the direction of their discrepancy 
scores. The D score between Trial 1 and esti 
mate was designated as positive if the esti- 
mate following Trial |! 
if lower, and 
The 


was higher, negative 
no change 


D score between estimate and Trial 2 


zero if there was 


rABLE 3 


DrrecrTi 


DISCREPANCY SCORES 


IN OF 


HS group LS group Chi square 
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was designated as positive if Trial 2 follow- 
ing the estimate was higher, negative if lower, 
and zero if there was no change. Similarly, 
the D score was designated as positive if 
Trial 2 was higher than Trial 1, etc. Analyses 
of the percentages of directional differences by 
the chi square method revealed no significant 
differences between the two groups in the 
three conditions. That is, there were no sig- 
nificant differences between the HS and LS 
groups in the percentages of subjects showing 
positive or negative D scores. 

The results of this phase of the study indi- 
cate that the differences between the HS and 
LS groups resulted not from the actual per- 
formance or in the direction of the D score, 
but in the setting of estimates or aspiration 
levels. Although there were no significant dif- 
ferences in the percentages of subjects re- 
vealing positive or negative D scores between 
the two groups, the HS group, as contrasted 
with the LS group, showed significantly 
larger discrepancy scores between their actual 
performance and estimate, whether in the 
positive or negative direction. The LS group 
as contrasted with the HS group, was gener- 
ally less extreme in setting their estimate 
either in the positive or negative direction. 


Since the discrepancy score represents the 
subject’s adjustment to the reality of her 
own performance, the LS group was in better 


contact between goal and accomplishment 
than the HS group. In a sense, the LS group 
was capable of manifesting adaptive flexibil- 
ity in such situations to a greater degree than 
the HS group. 

Further comments appear indicated in re- 
gard the composition of the HS group of 
this study, since it would seem anomalous for 
someone to evaluate his actual self higher 
than his ideal-self as did some of the schizo- 
phrenic subjects in the HS group. It would, 
indeed, be very unusual for someone to evalu- 
ate his actual self higher than his ideal-self 
but such occurrences should be expected from 
some hospitalized schizophrenic patients. A 
frequently observed phenomenon in the clini- 
cal setting, are some schizophrenic patients 
revealing extremely high self-regarding atti- 
tudes who deny to themselves and to others 
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the extent of their pathology and discontent. 
Such schizophrenic subjects have an unusu- 
ally high self-concept which, of course, repre- 
sents an unrealistic self-appraisal. It should 
not be surprising, then, that some of the 
schizophrenic subjects comprising the HS 
group rated their actual selves higher than 
their ideal-selves. However, further study is 
needed to clarify the significance of such 
discrepancies reflected in the self-satisfaction 
index of such subjects. 


SUMMARY 


The present study sought to test two hy- 
potheses: (a) schizophrenic subjects reveal- 
ing extreme self-satisfaction tend to deny and 
suppress threatening features of themselves 
to such an extent that they will recall less 
items reflecting unfavorable personality char- 
acteristics from a passage designed to simulate 
a personality evaluation, than schizophrenic 
subjects admitting some dissatisfaction with 
themselves, and () schizophrenic subjects 
revealing extreme self-satisfaction will reveal 
greater discrepancy between their level of 
performance and level of aspiration than 
schizophrenic subjects not so characterized. 
Both hypotheses were supported when tested 
on a sample of 44 hospitalized schizophrenic 
women. Implications were drawn with regard 
to psychological adjustment. 
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One of the key problems in the employ- 
ment of projective techniques is the inter- 
pretation of the content of a TAT story with 
reference to possible behavioral correlates. 
For example, assume “A” and “B” both 
manifest an equal amount of hostility in their 
protocols to two different series of TAT pic- 
tures. Assume further that the hostility score 
employed is a sophisticated one taking cog- 
nizance of the to aggression in 
the stories as well as to the direct expression 
of hostility. Does the equality of scores for 
A and B provide any foundation for the pre- 
diction that the behavioral correlates of their 
scores should be the same? Hardly, unless we 
can ascertain that the hostility cards in the 
series for each person are equal in their hos- 
tility-educing properties, and that the over- 
all scores represent equal deviations from the 
stimulus properties of each card for each 


inhibitions 


story. This latter statement is necessary since 


if A obtained his hostility score from telling 
chiefly hostile stories to nonhostile cards and 
nonhostile stories to hostile cards, the inter- 
pretative significance might differ from that 
attributed to B who followed the stimulus 
properties of the cards closely in giving hos- 
tile to hostile 
stories to nonhostile cards. Thus, the impli- 
cations for interpretation differ even 
though the overall scores are equivalent. This 


stories cards and nonhostile 


may 
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topic has been dealt with more fully e 
where (Murstein, 1961). 

It is apparent that the relationship of re 
sponses on the TAT to the stimulus quali 
ties of the cards may have important be 
havioral correlates which are helpful in the 
assessment Though several 
studies have been undertaken applying scal 
ing techniques to thematic cards (Auld, Eron, 
& Laffal, 1955; none have 
scaled the TAT in its entirety. Moreover, the 
previous studies employed only the Guttman 
technique. One might ask whether a series of 
scaling devices currently employed in measur 
ing attitudes could be used to determine the 
stimulus properties of the entire series of 
TAT cards? If the stimulus value of the cards 
could be ascertained, then the relationship 
between the subject’s response to the cards 
and the stimulus value might provide mean- 
ingful inroads into the study of personality 
Our specific question therefore was formu- 
lated as follows: 


of personality. 


Lesser, 1958) 


can a scale of hostility be 
constructed by each of the following scaling 
devices: Thurstone Equal Appearing Interval 
method (EAI); Successive Categories method 
(SC); Likert method; Edwards Dis 
crimination technique; and the Stouffer, Bor 
gatta, Hays, and Henry H-technique? 


Scale 


PROCEDURI 


Subjects were composed of 100 University of Port 
students, obtained from the 


ous sections in general psychology There 


197 


land undergraduate vari 


were an 
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equal number of men (50) and women (50) for the 
Thurstone procedures. The Likert, Edwards, and 
Stouffer methods were applied to 42 men and 58 
women. The same subjects participated in each scal 
ing method with the exception that there were eight 
men missing and eight additional women present tor 
the Likert, Guttman, Edwards, and H-technique 
procedures. 

The EAI and SC methods differ primarily in that 
the former assumes that equal intervals exist for the 
11 sorting categories while the latter method does 
not make this assumption but instead measures the 
actual width of the intervals between the cards 

The data for both methods, however, may be ob- 
tained in the same manner. Accordingly, 
four to six subjects were instructed to stand be- 
fore a table upon which was placed nine sheets ot 
white paper, numbered one through nine, to repre 
sent nine categories of judgment. They were pre 
sented with the 31 TAT cards in random order and 
given these instructions: 


groups ol 


You will be shown a series of 31 pictures which 
you are to judge objectively for the amount of 
hostility shown. By hostility 1 mean unfriendli 
ness, anger, the desire to hurt either physically or 
mentally. The expression of hostility can vary from 
barely noticeable to extremely intense. It may be 
directed towards people, animals, objects, or noth- 
ing in particular. Your task is to judge the 31 
cards according to the amount of hostility 
on each card. In front of you are numbers from 
one to nine which represent a continuum from the 
least amount of hostility to the greatest amount 
Thus, the least hostile card would be put in pile 
Number 1 while the most hostile card would be 
put in pile Number 9. 

Pile Number 5 represents the midpoint category 
separating the more than average hostile cards 
from the less than average hostile cards. For ex- 
ample, a card which seemed more hostile than 
average, but not extremely hostile might be put 
in a pile higher than the midpoint but yet not in 
one of the extreme piles. 


shown 


Remember you are to judge these cards accord- 
ing to the amount of hostility they objectively 
possess, mot how you personally feel about them 
Do not forget to judge the blank card. Are there 
any questions? 


The judgments were tabulated on a data sheet 
after each subject had completed the task. The data 
for the Likert, Edwards, and H-techniques were ob 
tained in a group situation. With about 30 subjects 
seated in a classroom on each occasion, subjects were 
given the following instructions: 


Now I would like you to look over each card 
individually and tell me how hostile it seems to 
be [hostility was defined in the same way as it 
was defined in the instructions for the aforemen- 
tioned methods]. For each card I show you, check 
to see that you have the right number of the card 
as I call it out and then looking at the picture, 
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circle the phrase which best describes how hostile 
the picture appears. You have five choices: (1) 
very hostile, (2) fairly hostile, (3) undecided, (4 
little hostility, (5) not h 


Slides containing facsimilies of the 31 TAT cards 
were presented to the three groups of approximately 
30 students, one slide at a time. Each slide was pro 
jected on the screen for seconds, during which 
time the subject recorded his judgment. To roughly 
control for serial position, one group received the 
pictures in numerical order, the second in inverse 
numerical order, and the 
middle numbered 
the following cards in dé ng order from each 
side of the middle card. Thus, Card 11, the 


starting with the 
card essively presenting 
sixteenth 
presented first, 
( ard) and 


card in the series of 1 card was 


followed by Card 10 (fifteer then by 


Card 12M 


eventeer 


RESULTS 


Since there appeared to be little difference 
between the judgments of the sexes, the suc- 
cessive categories values for both sexes were 
correlated. The r of .93 indicated no reason 
why the judgments could not be viewed as 
coming from the same population of judg- 
ments. Accordingly, the median scale values 
(S) for the EAI, SC, mean Likert values, and 
the EAI interquartile deviations (Q) for each 
card for the total group, are listed in Table 1 
together with the Likert ¢ values. 

A test of internal consistency was applied 
to the values obtained via SC to determine 
whether the assumptions for scaling 
supported. These assumptions were: (a) the 
projection of the cumulative proportion dis- 


were 


tributions for the various cards is normal on 
the psychological continuum; (6) the psy- 
chological dimension scaled is unidimensional; 
and (c) the standard deviations of the dis- 
criminal dispersions are equal. In using the 
fourth assumption made is that 
there is zero correlation among the stimuli, 
be independ- 


x- test, a 


since the proportions used must 
ent of each other (Edwards, 1957; Guilford, 
1954). The y* formula suggested by Guilford 
(1954, p. 232) was employed, which due to 
the large number of degrees of freedom, 210, 
was transformed into an approximate ¢ ratio. 
The ¢ value was 24.20 which was highly sig- 
nificant beyond the .001 level. It is apparent 
that one or more of the multiple assumptions 
made in scaling the cards was unjustified. To 
determine how important this finding was, we 
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TABLE 1 


EQuAL APPEARING INTERVAL, SUCCESSIVE ( 


obtained the size of the discrepancy between 
the theoretical and empirical proportions of 
judgments for each of the cards in the vari- 
ous categories. The theoretical proportions 
were obtained by taking the scale value of 
each of the 31 cards and subtracting it from 
each of the cumulative interval widths (Ed- 
wards, 1957). This yielded a 31 X 8 matrix 
of theoretical deviates with the columns rep- 
resenting the cumulative interval widths and 
the rows the various cards. By reference to 
the table of the normal curve these values 
were transformed into theoretical cumulative 
proportions. Each of these proportions based 
only on the knowledge of the interval widths 


ATEGORY 


UV, AND ft 


non 


5 
j 
/ 
/ 


4 


and the scale values of the cards was com- 
pared with its empirical counterpart. The av- 
erage value for the discrepancy between the 
248 theoretical and empirical proportions (31 
8 categories) was .038. This value is 
not exceedingly large, and indicates that the 
degree of lack of internal consistency in the 
scaling was not great although the confidence 
in the significance of the disparity is very 
high. 

The Likert values were obtained by taking 
those individuals whose overall hostility score 
placed them in the top quartile and compar- 
ing their scores for each card with those per- 
sons whose score placed them in 


cards x 


the lowest 
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rABLE 2 


EQuaAL APPEARING INTERVAL SCALE VALUES, Q VALUES, 
AND tf VALUES FOR NINE CARDS SELECTED FOR TEST 
OF UNIDIMENSIONALITY 


Scale 


value 
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quartile. A ¢ value was then obtained be- 
tween the groups with regard to their scores 
on each card. Table 1 indicates that 11 of 
the 31 cards proved to be significantly dif- 
ferentiated at the .05 point, 18 at the .01 
point, and only 2 proved to be not significant 
at all. 

The Edwards Scale Discrimination Tech- 
nique was utilized as follows: 

1. The 15 cards above the median QO value 
were discarded. 

2. The cards which were most representa- 
tive of the entire range of S values obtained 
from the EAI method were selected. 

3. Those cards which, however, did not 
posses highly significant (p< .01) ¢ values 
as obtained from the Likert method also were 
discarded. 


rABLE 3 


RESPONSES TO CARD 13MF as RELATED TO 
THE OVERALL SCORE FOR THE NINE Carbs 


Card 13 MI 
Total score 


17-18 
15-16 
14 
13 
12 
11 
10 
8-9 
1-7 
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The result of this analysis was the selec- 
tion of nine cards which were to be tested for 
unidimensionality via the coefficient of re- 
producibility. These cards from low to high 
stimulus value of hostility were 10, 13G, 13B, 
7GF, 6GF, 9GF, 3GF, 18BM, and 13MF. 
The EAI, S, Q, and ¢ values for each of these 
cards are shown in Table 2. 

The Stouffer, Borgatta, Hays, and Henry 
H-technique (1952) was used to determine 
the coefficient of reproducibility. The Likert 
responses (very hostile, fairly hostile, unde- 
cided, little hostility, not hostile) were col- 
lapsed into three judgments: hostile, unde- 
cided, and not hostile, which received weights 
of 2, 1, and O, respectively. The distribution 
of total scores was then obtained for each of 
the nine cards, using the weights assigned to 


rABLE 4 
CuTTING PoINtT SELECTED FOR ( 
Wuicu MEETS CRITERIA FOR 
viA H 


aRD 13MI 
SELECTION 


PECHNIQUI 


Hostile 
Score for judgment response 


of all nine cards 7 2 


> 10 73 
-10 Qa 


Note aA represents error cei 
point of 0,1 and 2 fulfills the f 
cell is greater than the sr 


and (6) the total error per 


* with cutting 


the three response categories. Table 3 indi- 
cates by way of example, the relationship of 
the total response score on the nine cards to 
the score obtained for Card 13MF. Upon in- 
spection of this table the best cutting points 
for the two possibilities 0, 1 vs. 2 and O vs. 
1, 2 were selected, bearing in mind two cri- 
teria. There were (a) neither error cell has 
a higher frequency than the smaller of the 
two frequencies on the principal diagonal 
(nonerror), and (4) the sum of the fre- 
quencies in the two error cells should be less 
than 30% of the total frequency. An exam- 
ple of one of these splits for card 13MF is 
shown in Table 4. 

By using more than one split with each 
card the nine original cards were condensed 
into four “contrived” cards. Each contrived 
card contained a “triplet,” i.e., three cards 
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rABLE 5 


rAT Carps AND CuttinG Points USED IN 


TAT for Hostility 


THE CONSTRUCTION OF CONTRIVED CARDs 


V = 100 


Response Weight 


Positive 


Fre yuency ol 


hostile 
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which indicate 
which judgment or judgments are to be con- 
sidered as a “hostile” choice, and which “non- 
hostile.” Since there are two possible adjacent 
cutting points (0) vs. (1, 2) and ( 1) 
(2), each card could be used more than once 
if desired, using a different cutting point in 
each case. In order for a contrived card to be 
judged hostile, the responses to two or more 
of the members of the triplet must be judged 
hostile. With the number of possible scale 
types limited to five, the resulting coefficient 
of reproducibility was .965. This value is 
slightly inflated due to the fact that in choos- 
ing our cutting point we have taken advan- 
tage of favorable sampling errors. Neverthe 
less, the coefficient is sufficiently high to con 
clude that the responses to the cards can be 
reproduced with a satisfactorily high degree 
of accuracy from a knowledge of the total 
scores alone. 

It should be noted that all 
lected were able to fulfill Condition @ men 
tioned above. Those cards not meeting this 
criterion, along with those not meeting other 


with prescribed cutting points 


Vs 


not cards se- 


selection, are listed in Table 5 
along with the various cutting points, re- 
sponse category weights, the frequency (out 


criteria for 
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of 100 subjects) with which the card in a 
particular split was judged hostile, and the 
contrived card into which the card was placed. 

It is evident from examining this table that 
it was not possible to obtain a good split for 
a card which is judged hostile by 50% of the 
subjects. This failure is similar to that usu- 
ally experienced with the Thurstone meth- 
ods in attempting to get good differentiating 
items, or items with low Q values which at 
the same time represent the middle of the 
psychological continuum. 

With four contrived cards there are 16 pos- 
sible scores or types of response patterns, of 
which 5 may be designated as scale types and 
11 nonscale types. The frequency with which 
each type was found is listed in Table 6. 
Examination of this table indicates that only 
7 persons out of 100 are nonscale types. 

Last, the values obtained for the 31 cards 
via the EAI, SC, and Likert methods were in- 
tercorrelated. The resulting correlations were: 
EAI vs. SC, .99; EAI vs. Likert, .94; 
Likert vs. SC, .92. 


and 


DISCUSSION 


Our data answer a few questions, and like 
much research, raise many others. Apparently, 


the TAT cards are readily scalable by a multi- 
tude of scaling methods. The Thurstone meth- 
ods yielded a fairly representative range of 
the dimension of hostility with Q values not 
exceedingly higher than those often obtained 
for attitude statements. The coefficient of re- 
producibility of .965 obtained by the H-tech- 
nique method also compares favorably with 
those usually obtained with attitude state- 
ments. 

There are, however, further considerations. 
It is apparent from a perusal of Table 1 that 
the differential ability of the cards with re- 
gard to separating high hostility perceivers 
from low hostility perceivers is not greatly 
dependent upon the scaled value of the card. 
There are several instances where two cards 
are perceived as nearly equivalent on the di- 
mension of hostility and yet one card is able 
to differentiate the aforementioned high and 
low hostility perceivers while the other is not. 
For example, Card 10 is given an EAI scaled 
value of 1.27 which is nearly identical to the 
scaled value of 1.19 received by Card 16. The 
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t value of Card 10, however, was 4.67 while 
that of Card 16 was .55. There are several 
possible explanations. One is that some cards 
contain many possible dimensions in their 
stimulus characteristics. High hostility per- 
ceivers (upper quartile in the Likert method), 
however, are more sensitive to the hostile pos- 
sibilities of the card than to other motives 
such as achievement, sex, and affiliation, to 
name just a few. Low hostility perceivers 
(lower quartile in the Likert method), how- 
ever, are probably either disposed to avoid 
seeing hostility or perhaps simply able to per- 
ceive the other dimensions as more strongly 
characterizing the picture. Card 10, which is 
described by Murray (1943) as “a young 
woman’s head against a man’s shoulder” (p. 
19) would seem to be a multidimensional pic- 
ture. There are many plausible explanations 
for the embrace, some positive, others nega- 
tive. Card 16, however, is a picture devoid of 
any motives from a stimulus point of view. 
Hence, high hostility perceivers cannot choose 
any alternative except to perceive the picture 
as nonhostile. To do otherwise is to deviate 
sharply from the stimulus possibilities of the 
card. The low hostility perceivers likewise 
would be naturally expected to perceive little 
hostility. 

It is thus conceivable that the number of 
alternative themes that can be perceived in a 
picture will determine its differential ability. 
The greater the number of possible themes in 
a card, the greater the differentiation between 
the judgments of persons high and low on one 
of the dimensions of the card. 

It also follows that the greater the number 
of themes, the less likely a single motive is 
to receive a uniformly high judgment from 
all subjects. The reason for this assumption 
is that not all subjects will perceive the domi- 
nant motive since they may be more sensi- 
tized to other motives. The result is that the 
overall saliency of a motive is not only a 
function of the stimulus impact of the mo- 
tive, but a function of the number of com- 
peting motives as well. 

How then can one be sure as to which of 
these methods of judging is employed by the 
subject? One answer is to have individuals 
judge a picture for all possible motives. The 
perceptual ambiguity level could then be 
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determined (Kenny, 1961) by the equation 
A=1-— Sp(s) 1 equals the percep- 
tual ambiguity of the picture and p(7) equals 
the proportion of any i motive appearing in 
the picture. 

A will be at a maximum when the propor- 
tions of all motives are equal (i.e., Motive 
A = 33, Motive B = .33, Motive ¢ 33; 
A = .67). A will decrease as the 
the proportion of two or more motives widens 
(i.e., Motive A = .50, Motive B = .45, Mo- 
05; A = 45). Current work on the 
differentiating value of the cards as a 
tion of 


where 


split between 


tive C 
func- 
A is underway and will be 
in future articles 

Another problem with scaling procedures is 


reported 


that they usually have failed to incorporate 
the presence of inhibitory factors into the 
scaled values. If two people perceive an equal 
amount of hostility yet differ in the inhibi 
tions expressed with regard to this hostility, 
their personalities might differ radically. Yet, 
our scaled values along with 
involving 


all other studies 
themati would 
fail to take cognizance of this fact. It would 
seem, therefore, that the prediction of overt 
he havior from a 


scaling of stories 


knowledge of the scaled 
value of pictures might be improved if the 
reflected the multidimension- 
ality of the stimulus properties of the pic- 
ture. Perhaps some of the newer multidimen- 
sional scaling devices (Torgerson, 1958) may 


scaled values 


prove to be of greater value than the older 
methods 

The large variability of the O values might 
tend to make one believe that O might be re- 
garded as an index of projection for a pi 
ture. Pictures with low Q values might be poor 
pictures for projective purposes while a high 
amount of variability for the objective dimen 
sonality of a picture (high Q), might be con- 


sidered a good index of projection. Little sup- 
port, however, is given for the belief that high 
and low perceivers of hostility 
ated by high O variables if one examines 
Table 1. 


are differenti- 


Two factors may serve to explain 
this result. First, a low QO value for the judg- 
ments of one dimension may become a high 
© value when other dimensions are considered 
in the same picture. Secondly, a 
not be relevant for a 


card may 
dimension. Ac- 


cordingly, high Q values may result not be- 


given 
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cause individuals differ as to the placement 
of the card on a dimension, but because they 
differ as to whether or not the picture be 
longs in the dimension continuum at all. Un- 
der these circumstances, a high Q may merely 
reflect a high amount of error variance in the 
judgment due to the random assignment 

a card when it does not seem ap- 
Thus, for QO to be 
regarded as an accurate measure of projection 
we should ascertain that the 
to the dimension being judged 


values to 
plicable to a dimension 
card is relevant 
and that it 
taps this dimension only. Since it is doubtful 
that the current TAT cards 
teria completely, the use of QO as 
index would seem to have 


meet both cri 
i projective 
ome drawb if ks 


There are obviously a good many difficul- 


ties in the scaling of thematic cards and the 
worth 
of the 


innot a skilled 


question may arise as to whether it is 
all of the trouble. Is a 
stimulus value important? ¢ 
along”’ 
stimulus value of 
that behavior may be 


knowledge 


clinician “get without knowing the 
the cards? It s our bell 

viewed as the pooled 
interaction of stimulus, background, and or- 


ganismic variables, a view very close to that 


expressed by Helson (1955). From this frame 
of reference, knowledge of the stimulus prop- 
erties seems essential to the accurate predic- 
tion of behavior. The “good” clinician prob- 
ably carries in his head a normative index of 
responses to each of the TAT pictures which 
serves as a 
value. But, 
experience probably could not achieve the ac- 
that the 


a well standardized group would 


stimulus 
limited by his own 


timate of the 
the clinician 


rough ¢ 


curacy of estimation actual meas 
urement of 
achieve. 
Yet another important factor is the deter- 
mination of the between _ the 
card, the 
elicited, and the possible behavioral correlate 
Chis important area has been untouched by 
psychologists because of a lack of knowledge 
of the stimulus properties of the cards (Mur- 
1959). It should be possible shortly, to 
determine the relationship, if any, between 
perceptual deviancy and behavioral deviancy 
It is not assumed that maladjustment is a 
simple linear function of the discrepancy be- 
tween the stimulus properties of a picture and 
told to it 


relationship 


timulus properties of a response 


stein 


the story These may be a curvi 
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linear, hyperbolic, parabolic relationship, or 
perhaps, no relationship. The determination 
of an answer to this important question only 
awaits our quantification of the stimulus 
properties of our projective instruments 

In closing, it should be emphasized that 
the values obtained may not hold for other 
kinds students at other In fact 
serial effects with each scaling 
method as well as between the methods have 
not controlled. To so 
would have involved a great number of sub- 
jects, which, while desirable, was not prac- 
tical, not directly pertinent to the rather 
broad purpose this study. Such refine- 
ments should, however, be utilized where the 
scale values themselves are of concern rather 
than the question of whether scaling itself 
can be achieved. 


of locales. 


position 


do 


been adequately 


of 


SUMMARY 


The purpose of this study was to determine 
whether the entire set of 31 TAT cards could 
be scaled for the dimension of hostility 
through the use of several widely used scal- 
ing methods. 

A group of 100 undergraduate psychology 
students were administered the TAT cards 
via slide projections and asked to judge the 
slides with regard to the dimension of hos- 
tility. The judgments were scaled by the 
Thurstone Equal Appearing Interval and Suc- 
cessive Category methods, the Likert method, 
Edward Scale Discrimination technique, and 
Stouffer, Borgatta, Hays, and Henry H-tech- 
nique. By employing various criteria such as 
adequate range coverage, and differential abil- 
ity between high and low hostility perceivers, 
eight cards were finally selected. These were 
10, 17BM, 7GF, 6GF, 9GF, 3GF, 18BM, 
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13MF. The coefficient of reproducibility for 
these cards using the H-technique method of 
“contrived cards” It was concluded 
that all of the aforementioned methods could 
be used in scaling the dimension of hostility. 
The implications of the results with regard 
to future work in the area of personality were 
discussed. 


was .906. 
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THERAPISTS’ JUDGMENTS CONCERNING 


PATIENTS 


CONSIDERED FOR PSYCHOTHERAPY ' 
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In recent years, an increasing amount of 
attention has been devoted to the problem of 
duration of stay in psychotherapy. Reports 
from many diverse types of clinical settings 
have indicated that early discontinuation in 
outpatient psychotherapy is a reliable find- 
ing of some importance (Affleck & Mednick, 
1959; Garfield & Kurz, 1952; Rogers, 1960). 
Several studies have attempted to appraise 
selected patient variables related to and 
predictive of continuation in psychotherapy 
(Rosenthal & Frank, Rubinstein & 
Lorr, 1956; Sullivan, Miller, & Smelser, 
1958; Taulbee, 1958). With the possible ex- 
ception of educational level, the findings on 
most of these variables have been inconsist- 
ent. Research in our setting on patient at- 
tributes related to termination indicated that 
diagnosis, 


1958; 


sex, age, and education were not 
significantly related to duration of stay (Gar- 
field & Affleck, 1959). Education below a cer- 
tain minimal point may be important, but we 
found no evidence to indicate its usefulness 
as a predictor with persons who have gone 
beyond the eighth grade in school. 

The general failure to relate broad patient 
variables to attrition led to an interest in the 
therapist as a variable affecting attrition rates. 
While it is apparent that the interaction be- 
tween the individual patient and the indi- 
vidual therapist is exceedingly important for 
the problem of attrition, we were interested 
first in getting a better understanding of the 
orientation that therapists have toward candi- 
dates for therapy in general. Are there com- 
mon points of view toward patients? What 
patients are initially viewed in a highly fa- 
vorable way? For what reasons is this the 
case? Which patients are seen negatively? 

1 Presented in part at the Annual Meeting of the 
American Psychological 
tember 1960. 


Association, Chicago, Sep- 
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Are anxiety and defensiveness related to the 
judgments and attitudes therapists have to- 
ward patients? These were some of the ques 
tions that led to an initial exploratory study 
of therapists’ attitudes toward therapy candi 
dates. This in turn was part of a larger study 
of variables related to continuation and prog 
ress in psychotherapy. 


PRESENT STUDY 


In this investigation, therapists were asked 
to complete a brief questionnaire and check- 
list at staff meetings at which cases were 
discussed and considered for outpatient psy- 
chotherapy. The questionnaire included open- 
ended questions on assets, deficiencies, goals 
in therapy, and likely problems in therapy. 
Each therapist also was asked to rate each 
patient on a four-point scale in terms of thera- 
peutic excellent, fair, 
poor. Similar ratings were requested concern- 
ing the degree of anxiety in the patient, the 
latter’s defensiveness or rigidity, the rater’s 
personal feelings toward the patient, and the 
rater’s interest in taking the patient on for 
psychotherapy. 


prognosis good. or 


The ratings were secured in a regular out- 
patient staff meeting which met weekly for 2 
hours. Two to three cases were discussed at 
each meeting. These cases had been seen 
previously by a psychiatric resident and so- 
cial worker, and in about one-half of the cases 
by a psychologist. The intake reports were all 
read in their entirety. After the intake ma- 
terial was presented to the staff, but prior to 
any discussion of the case, each of the indi- 
viduals at the staff meeting was asked to fill 
out the questionnaire. 

Responses were secured from 20 different 
therapists from three disciplines: psychiatry, 
clinical psychology, and psychiatric social 
work. The number of patients rated and 
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rABLE 1 


ETWEEN 


rher- 
apist \-( 


59(19)** 

.62(24)** 
-.70** 

54" 
-.50(13) 

.91(16)** 
-.26 

10 

33 

13 

04015 

.53** 


ao 


Numbers in parentheses 
\-Therapeutic progno 
B-Degree of anxiet) 
C-Detensiveness and rigidit 
D-Personal feelings 
E-Interest in taking into t 

* Significant at .05 level 
* Significant at .01 level 


evaluated by each therapist varied from 7 to 
32 with a median number of 18 patients. All 


of the patients were individuals who had ap- 
plied for outpatient psychotherapy and had 
been recommended for intake evaluation by 
the initial screening committee. The patients 
consisted of 18 men and 20 women ranging 
in age from 13 to 51 years, with a median 
age of 27 years. In terms of diagnosis, the 


group was as follows: Psychoneurosis, 16; 
Personality Disorders, 14; Psychosis, 3; and 
other diagnoses, 5. 


RESULTS 


The responses secured from the therapists 
were tabulated for each category of response. 
In order to evaluate the reliability of the 
ratings, average intercorrelations were com- 
puted on the five raters who had seen at 
least 16 patients in common. Two of these 
raters were staff psychologists, one was a staff 
psychiatrist, and two were psychiatric resi- 
dents. Ebel’s (1951) technique for estimating 
the reliability of ratings was used. The judges 
showed a high degree of agreement in their 
ratings of therapeutic prognosis (r = .88), 


Correlation 


Ratrep Pari 


coethcent 


A-E 


42* 

39* 

26 
-.50(14) 

65(16)** 
-.28 


personal feelings toward the patient (r = .79), 
interest in taking the patient on for ther- 
apy (r = .80), and patient’s anxiety level (r 

88). Moderate agreement was obtained in 
the judges’ estimates of the patient’s defen- 
siveness (7 = .68). 

The ratings obtained were then intercor- 
related where appropriate. Thirteen therapists 
who rated at least 12 patients were used in 
this analysis, which forms the first part of 
our report. We shall discuss now the ratings 
of therapeutic prognosis and their relation- 
ship to other judgments 

In line with other findings, it was hypothe- 
sized that degree of anxiety would be cor- 
related positively with prognosis whereas de- 
fensiveness and rigidity would be negatively 
correlated with prognosis (Rubinstein & Lorr, 
1956; Taulbee, 1958). These predictions were 
generally supported, although not to a marked 
degree. As can be seen in Table 1, all the 
correlations between therapeutic prognosis 
and degree of anxiety are positive, but less 
than a third of them are statistically signifi- 
cant, with the median being .38. Ratings of 


anxiety thus bear only a limited relationship 
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to ratings of prognosis. The relationship be 
tween prognosis and defensiveness appeared 
to be somewhat more marked with approxi- 
mately half of the correlations approaching 
significance. As might be anticipated, this re- 
lationship was negative in all but one case. 
Apparently, our judges react somewhat more 
strongly to defensiveness and rigidity in re- 
lation to prognosis than they do to anxiety in 
this regard. 

Ratings of prognosis, on the other hand, 
were highly correlated with positive feelings 
of the judges toward the patient. Only one of 
the correlations was not significant at least at 
the .05 level of confidence, with the median 
correlation being .66. The personal feeling of 
the raters thus appears most closely related 
to ratings of prognosis, or vice versa. Ratings 
of “interest in taking the patient into treat- 
ment” were also highly correlated with both 
ratings of therapeutic prognosis and the per- 
sonal feelings of the raters toward the pa- 
tients. The latter finding suggests that per- 
sonal feelings toward the patient, interest in 
taking the patient on for therapy, and judg- 
ments of prognosis may be manifestations of 
the same positive view of the patient. One 
cannot state whether the raters “like” patients 
with good prognosis, or whether a good prog- 
nostic rating is given to patients that the 
therapist reacts to personally in a positive 
fashion. 

The other findings were not as marked, al- 
though there was some negative relationship 
between the personal feelings of the rater and 
defensiveness of the patient. It would thus ap- 
pear that judgments of prognosis are most 
closely related to the personal feelings of 
therapist judges (or vice versa), and that the 
latter bear more relationship to judgments of 
defensiveness and rigidity than they do to 
judgments about the patient’s anxiety. This 
pattern generally is congruent with that 
recently reported by Strupp and Williams 
(1960). In studying therapists, they 
found that “nondefensive, insightful, likable 
and well-motivated patients were seen as most 
likely to improve in psychotherapy” (p. 440). 


two 


ASSETS FOR PSYCHOTHERAPY 


As mentioned previously, each rater also 
was asked to list the therapeutic assets for 


for Psychotherapy 


PABLID 2 


Inte ligence 
Anxiety 
Motivation 
\ge 


Insight awareness o 


aiscomiort 


Pp 
Past adjustment 
\bility to relate 


each patient as well as to indicate likely prob 
lems to be encountered in psychotherapy. A 
total of 532 responses pertaining to patient 
assets were obtained with a variable number 
being listed for any given case. The average 
number per patient was one and a half. After 
a preliminary analysis was made of all the 
individual responses, the results were grouped 
into appropriate categories. Although a very 
large number of responses were listed, these 
could be classified with little difficulty into a 
relatively small number of categories. All of 
the items which were mentioned at least 10 
times are presented in Table 2. 

As noted in Table 2, three categories make 
up over half of the listed assets, i.e., intelli- 
gence, anxiety, and motivation. When age and 
insight are added, these five account for over 
80% of all the assets listed for these patients. 
On the basis of such ratings, one might infer 
that the average therapist prefers a patient 
who is intelligent, anxious, well motivated for 
therapy, young, and with some insight into his 
difficulties! This seems to be borne out by an 
analysis of the assets listed for patients in re- 
lation to the ratings by our judges of personal 
feelings toward the patients. When the total 
group of patients is dichotomized in terms of 
the median ratings on this variable, it is noted 
that the group which receives the higher rat- 
ings also receives almost twice the frequency 
of listed assets. This difference is significant 
at the .01 level of confidence (, 15.42, df 

- 1). With the exception of age, the assets 
mentioned are linked more frequently with 
patients given high personal preference rat- 
ings by the therapists. The preferred therapy 
patient, as inferred from these listings by our 
sample of therapists, bears a close resem- 
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rABLEI 3 


MEAN SCALE RATING 
AND 12 REMAINER R) 


DIFFERENCES BETWEEN 


12 Terminators (T) 


Mean Mear 


ot 


S« ale R 


Defensiveness 
Anxiety 
Prognosis 
Personal feelings 
Interest in taking 


* Significant at .05 level of confid 


blance to the type of preferred patient men- 
tioned in the research by Hollingshead and 
Redlich (1958). 

It is of interest also to comment on the 
variability with which the various assets were 
listed by different therapists. Intelligence, for 
example, was listed in one out of eight cases 
by one person, but in almost two out of three 
cases by another. In a similar fashion, anxiety 
or discomfort was given as an asset in over 
half of the cases by one therapist, but in only 
1 case out of 16 by another. Thus, while there 
is some consensus concerning desirable fea- 
tures in a psychotherapy patient, there is 
some variation among therapists concerning 
the frequency of emphasis on certain aspects 
of the patient. Our data are too meager at 
this point to permit us to infer any particu- 
lar relationship between the pattern of assets 
listed by specific therapists and other judg- 
mental variables. 


THERAPISTS’ JUDGMENTS AND DURATION OF 
STAY IN PsYCHOTHERAPY 


Of the 38 patients who were evaluated ini- 
tially at the outpatient staff conferences, 24 
were assigned to a therapist in our outpatient 
clinic, thus allowing for some follow-up study. 
All of the therapists who saw these patients 
participated in the initial rating procedures. 
The other 14 patients were referred to other 
agencies, clinics, or hospitals. In a few of 
these cases, no treatment was recommended. 
The median number of interviews kept for 
the group of patients assigned to therapy here 
was 17. (This atypically high figure may be 
somewhat misleading. There were 11 patients 


1 ffl ck 
who kept 12 or less interviews, a value which 
is the same as that previously reported as the 
median on a much larger sample of patients 
Garfield & Affleck, 1959.) 

It was hypothesized that ratings of low 
defensiveness, high anxiety, good prognosis, 
positive personal feelings, and a positive in- 
terest in taking would be related to greater 
duration of stay in psychotherapy. Differ- 
ences between the mean ratings of all scales 
for patients above and below the median were 
analyzed. Each patient was rated by a median 


of 9 raters, with a range of 6 to 


14 raters. 
Table 3 presents the results of these analyses 


The only rating which was significantly re- 
lated to duration of stay was prognosis; pa- 
tients remaining in therapy longer are rated 
initially as having a better prognosis. De- 
spite the moderate intercorrelation of prog- 
nosis with the personal feelings of the thera- 
pist, ratings of the latter were not significantly 
related to duration of stay. The failure of rat- 
ings of personal feelings, interest in taking 
the patient on for therapy, anxiety, and de- 
fensiveness to predict duration of stay is of 
interest in the light of our previous findings 
on therapists’ preferences. Tentatively, it ap- 
pears that the set therapists develop toward 
patients on these dimensions are reliable, but 
have no predictive validity as regards dura- 
tion of stay. While interest in taking a pa- 
tient on for therapy and the personal feelings 
of the therapist toward the patient were sig- 
nificantly correlated with ratings of prognosis 
and thus suggestive of a common view of the 
patient, only the latter appeared related to 
duration of stay in psychotherapy. 


SUMMARY AND CONCLUSION 


Our findings show a high degree of agree- 
ment among therapists in terms of judgments 
of prognosis, personal feelings toward patients, 
and interest in taking patients on for psycho- 
therapy. Ratings of these variables also show 
moderate intercorrelations. This suggests that 
certain patients have a high valence for thera- 
pists and that there is agreement among thera- 
pists as to who these patients are. A some- 
what similar was evident in the 
listing of patient assets for psychotherapy 
where five assets constituted 80% of the total 
listed. In terms of the assets listed and the 


consensus 
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negative feeling expressed by therapists to- 
ward defensiveness in patients, it would ap- 
pear that a positive reaction is expressed to- 
ward the patient least difficult to work with 
and, possibly, the person least in need of 
skilled help. It was further demonstrated that 
patients who evoke positive feelings from 
therapists are characterized by those thera- 
pists as having significantly more assets, par- 
ticularly intelligence, motivation, anxiety, and 
insight. 
When the 
duration of 
remaining in therapy longer were rated as 
having a better prognosis. None of the other 
ratings were significantly related to duration 
of stay. While therapists show high agree- 
ment in their preferences and personal feel- 
these ratings were not re- 


ratings were related to actual 


stay, it was found that patients 


ings for patients, 
lated to actual duration of stay 
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AN EMPIRICAL SCALE OF THERAPIST VERBAL ACTIVITY 
LEVEL IN THE INITIAL INTERVIEW’ 
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HOWE anp BENJAMIN POPE 


University of Maryland School of Medicine 


Subjecting the psychotherapist to examina- 
tion as an independent variable reflects ac- 
ceptance of the proposition that, regardless of 
his theoretical orientation, what the therapist 
says is of central importance in the thera- 
peutic transaction. Subjecting him to similar 
examination in the initial interview implies 
that the therapist’s mode of verbalization 
may have an important bearing upon achieve- 
ment of his diagnostic or other goals. 

The last 20 or more years have seen two 
major transitions in the tactics and strategy 
of the initial interview, which have arisen 
largely as influences of Freudian psychoana- 
lytic theory. On the one hand there has been 
increased understanding that, no less than the 
formal therapeutic interview, the initial inter- 
view involves an interpersonal process influ- 


encing both the patient and the therapist as 


a participant observer. On the other hand. 
simultaneously, there has been increasing de- 
parture from the “fact gathering” typical of 
the earlier psychiatric interview, to a process 
in which the patient is encouraged, through 
relative passivity on the part of the therapist, 
spontaneously to unfold his story as he him- 
self feels it. Thus, many contemporary writers 


1 This paper arises out of research supported by 
Pilot Evaluation Grant No. 2M-6408 from the Na 
tional Institute of Mental Health of the National 
Institutes of Health, United States Public Health 
Service. The late Jacob E. Finesinger was the prin- 
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of this work until his-untimely death in June 1959 
A paper based partly upon the first four studies was 
presented by Pope, Howe, and Finesinger to Divi- 
sion 12 at the Annual Convention of the American 
Psychological Association in Cincinnati, Ohio, Sep 
tember 1959. Completion of Study 5 and of the 
present manuscript have been facilitated during 
tenure by Edmund S. Howe, of Research Grant 
M-3355, also from the National Institute of Mental 
Health. 


(e.g., Deutsch & Murphy, 1955; Finesinger, 
1948; Gill, Newman, & Redlich, 1954) at- 
tempt to arrive at some kind of working diag- 
nostic formulation during an initial interview 
not by eliciting a mass of factual information 
about various sectors and stages of the pa- 
tient’s life history; but instead by following 
the patient’s own leads, his sequential ac- 
count of himself, his life, and his difficulties. 
These transitions in the form of the initial 
interview can be described in terms of increas- 
ing adoption of the projective interview, in 
which it is now commonly accepted that one 
is apt to discover more information of a rele- 
vant nature either by remaining silent, or at 
most by asking rather vague, nonleading ques- 
tions onto which the patient may project his 
own referents, and his own interpretation of 
what is “meant.” In this way one learns much 
not only about circumstantial (factual) ma- 
terial, but also about those contiguous motiva- 
tional and associational processes which usu- 
ally lie nearer to the heart of the matter 
The foregoing developments have given rise 
Therapist Activity Level 
Finesinger, 1948) with the attendant 
implication that lower Activity Levels are 
potentially more advantageous than higher 
ones, for the purposes of gathering relevant 
information, fostering the development of 
transference reactions, and avoiding a shift 
into a social or personal relationship with the 
patient. (There are, however, obvious excep- 
tions to any general rule, such as the use of 
higher levels of activity for such supportive 
purposes as encouraging the inhibited patient 
to talk during an initial interview [Gill et al., 
1954] or to prevent acting-out behavior.) 
These commonly assumed benefits of main- 
taining a low level of verbal activity remain 
hypothetical, however, since they have never 
been subjected to experimental scrutiny. This 


to the concept of 
(e.g., 
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paper constitutes a preliminary basic step in 
a research program the aim of which is to 
evaluate the role played by, and the impact 
upon the patient of, the therapist’s Activity 
Level in the initial interview. The experiments 
to be reported at this time were performed 
(a) to examine the rateability of the concept 
of Activity Level in terms of three assumed 
attributes (to be discussed 
develop an Activity 
measurement procedures; and 


below); (6b) to 
Scale for subsequent 
(c) to explore 
some of the empirically controllable variables 
that might affect the reliability of application 
of such a scale to actual interview material. 
The choice of a definition of Activity, how- 
ever, presents a problem, for its attributes are 
not clear, and have never been spelled out. 
Deutsch and Murphy (1955) fo: 
made no attempt to define 
implicitly by 


example 


Activity, other than 


rejecting the question ind-an- 
swer interview pattern, and instead proposing 
a “process of facilitation through the selective 
repetition in interrogative form of the pa 
Finesinger (1948 
likewise skirted the conceptual problem of 


definition in 


tient’s remarks” (p. 18). 
expressing his preference for 
Activity which is kept “as low as is consistent 
with the attainment of therapeutic plans and 
192). 

research workers (e 
Dibner, 1953; Osburn 
accepted the term Ambiguity as 
aspect of therapist behavior 


goals” ( p. 
Several 


1955: 


g.. Bordin, 
1951) 
a significant 
fact 


have 


Dibner in 


showed that certain consequences in the pa 


tient’s behavior (e.g., increased “anxiety’’) 
follow greater therapist Ambiguity. Bernstein, 
Lennard, and Palmore 
served greater “ease of communication” by 
the patient following greater therapist Speci- 
ficity (i.e., less Ambiguity). Several vears 
earlier Snyder (e.g., 1945) investigated Lead, 
which he assumed to be a primary dimension 
of therapist verbal behavior 
interesting to note that when Freud [1948] 
himself abandoned hypnosis in favor of the 
psychoanalytic technique, he contrasted the 
suggestive nature of the former with the non- 
leading character of the latter.) Finally, it is 
considered that a therapist response may also 
be looked at from the standpoint of the de- 
gree of /nference which it carries, 
conveys to the patient 


(1958) likewise ob- 


(Indeed, it is 


or which it 
In the studies to be 


{ctivity Level 511 


described these three attributes, Ambiguity, 
Lead, and Inference, will be used to charac- 
terize what is meant by variations in Activity 
Level. It assumed for the purpose of 
these studies that the three attributes are 
moderately (if not highly) intercorrelated, so 
that the three terms are to some extent in- 
terchangeable. Thus, Ambiguity subjectively 
feels as though it would be negatively corre- 
lated with Lead and with Inference, whereas 
the last two would be positively correlated 
with each other. To this extent Activity is 
assumed, for present purposes, to be 
dimensional. 


was 


one 


METHOD 
Study 1. A broad variety of ov 
psychotherapy 
patients, different 
pists of different 


material 


interviews involving different 
phases of treatment, and 
theoretical allegiance, were 


source to compile a 


representative 
ot O abstract descriptions of 
Thirty 
these descriptive responses (presented o! 
individual 5-it rds) for Activity L 
an 11 point scale T 

Activity Leve 

A high-active response 
of course, necessarily one 
It does, however, have 

about it; 


therapist verbal 


sponses foard-certified psychiatrists 


’ 
each of 
vel alo 
definition char 
acterized 


from the ther 


api t 
which has greater 
relatively low 

it involves a marked degree of Lead by 
the therapist; and it carries a high degree of In 
ference. Conversely, a low-active response is 
manifests a l degree of Le 

the therapist; and it carries a low degree of In 
ference. Thus, compare the 


scriptive responses 


ambiguous; it 


following three de- 
1. Therapist gives a general, unfocussed invita 
tion for the patient to talk 
Therapist asks the patient 
occasion when a 


to describe the last 
pattern of symptoms occurred 
Therapist explores the patient’s feelings about 
something fust reported by the patient 
be 


Going from 1 to 2 through 3, the responses 
come less ambiguous, they 


lead, and they 


show progressively 
connote an increasing degre 
inference 

Each rater also sorted a duplicate set of cards 
into one of three groups: (a) 
mainly diagnostic in 


therapeutic 


responses primarily or 
(ignoring 
responses 


purpose 
value); (bd) 
mainly therapeutic in purpose 
diagnostic value); and (c) 
category 


secondary, 
primarily or 
(ignoring secondary, 
responses fitting neither 
The two tasks were given in one of two 
equences to alternate subject 
Results of Study 1. A Lindquist (1953, pp 

273) Type I analysis of the 11 point rating data 
established (a) an overall difference among the 5 
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t 1erapist responses placed (with sigr 


Fi 4 median vels of 


icant statistical 


conventional 


if 
agreement among 10 subjects) into five 
‘st bel 


penavio! 


a nonsignificant Sx 


e main effect; and (c) a nonsignificant Se 
quence Items action effect (p> .05 
+} 


he Activity ratings were thus clearly not 


inter Since 
iltered as 
a result of prior judgments of diagnostic vs. thera- 
peutic value, the rating data were then pooled. The 
interclass r was .50; the reliability of t 
ings, .93 (Guilford, 1954 

After 
quencies via Fisher’s Exact 1956) it was 
established that the 30 significantly 
upon only 10 of the responses as being primarily oi 
“treatment value 
marily of 


average ra 
computation of appropriate chance fr 
test (Siegel 
subjects agreed 
only 7 as being pri 
“diagnostic value A comparison of th 
mean Activity Levels of these two of re 
sponses via the Mann-Whitney U test showed 
the responses in the treatment category to be 
active (p< .001). This 
expectation, and constitutes a 
lidity for the working Activity 

Study 2. This was undertaken to examine the ré 
lationship between Activity Level as rated in Study 1 
and five 
the contemporary 


and upon 


groups f 
(1947 
more 
accords with commonsense 


modicum of face va 


concept of 


conventional labels frequently applied, in 
literature on psychotherapy re 
search, to various categories of therapist operations 


Ten new subjects, four psychiatrists and six clinical 


rABLI 
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STUDENTS 
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Student 
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experience, 
orted the l 
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subjective 
“Persuasive” response, 
ited most Active, was 
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original set of 50 r 
Figure 1 however, for 
ompleteness over the 
studied 
considerable overlay 


range of therapi ations actually 
that tl is 
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PARALLEI IVIT’ I ND MEAN 


rheray 
tient si 


therapist operations. The finding h havior in the initial 

the representativeness of the lerapist interview. Since initial int : 
sponses, and a meaningful order of conventional to involve the more active t s of therapist opera 
categories along the assumed dimension of Activity tions (e.g., interpretive), 25 of th active re- 
They accord, moreover, with the scheme for analysis sponses were removed from the original set of 5 
of Activity Levels and with the principles of focus lo the remaining 25 responses, 11 more were added 
and ot minimal activity set forth many year rhe new set of 36 responses was rated, as befor 


by Finesinger (1948) ilong an 11-point le Activity, by three gro 


ip 


Study 3. One of the ultimate applied goals of th of subjects. One group consisted of 15 of the origina 


research program was to study therapist verbal be- psychiatrists used in Study 1. Two other grou 
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TABLE 4 


MEAN ACTIVITY 
FOR Five 


LEVEL AND 
INITIAL 


DEVIATION 
Stupy 4) 


STANDARD 
INTERVIEWS 


Mea! 

Activit 

Leve | 

per re 

sponse 

Number 


ot entire 


over 


Cheoretical 


therapist inter- 
orientation i 


responses*® 


Rogers 
Deutsch 
Wolberg 
Gill 
linesingeriar 


were drawn from a class of 100 freshmen medical 
students. One group of 19 subjects had previously 
expressed very high interest in ultimate specializa- 
tion in psychiatry, while the other group of 18 sub- 
jects had expressed very low such interest. Inclu- 
sion of medica] students with high and low interest 
provided a check upon the independence of Activity 
ratings from psychiatric 
tion 

Results of Study 3. There was an overall between- 
group difference in mean Activity Level assigned the 
36 responses (see Table 1). The value of F was 5.82 

< .01),.the medical students with low interest in 
psychiatry being most deviant from the psychiatrists. 
The rank orders of the responses rated by the three 
groups nevertheless agreed fairly well. Values of rho 
(p< .001) are shown in Table 2. The interclass r 
was .49 for psychiatrists and .42 for each group of 
students; the reliability of average ratings was .93 
for all three groups (Guilford, 1954). These values 
are almost identical with those found in Study 1 
Indeed, the value of rho for the mean Activity 
Levels of the 25 responses common to Studies 1 and 
3 was .945 (p< .001). The data indicate that reli- 
ability of the rating procedure is but little altered 
by psychiatric interest and experience. 

Consequently, data from the psychiatrist ten 
in Study 3 were used to form two parallel Activity 
Scales. These are presented in Table 3. Each ordinal 
pair of items was matched on the basis of virtually 
identical mean Activity Levels and of nonsignifi- 
cantly different variances 

Study 4. This study was performed to make a pre- 
liminary test of the reliability and discriminatory ca- 
pacity of Scale A. The authors independently rated, 
in context, each therapist response in five unfamiliar 
published initial interviews. These, 
divergence of theoretical adherence, 
by Wolberg (1954, pp. 690-699) ; 


experience and sophistica- 


chosen for their 
were performed 
Deutsch (Deutsch 


& Murphy, 1955, pp. 29-49); Skinner 
Powdermaker, undated) ; Gill (Gill, 
134-204), and Rogers (1947, pp. 128-142). Since the 
Finesinger and Powdermaker interview was actually 
performed by a close adherent to the Finesinger tech- 
nique, all subsequent reference to this interview will 
be via the term “Finesingerian.” 

Results of Study 4. Reliability of scoring the five 
interviews was .90 or better. Table 4 mean 
and o of each interview for one of the two raters. 
The mean values differ from each other both by 
Fisher’s F and by Kruskal-Wallis’ (1952) H (p 

001). This result supported the assumption that 
the Activity samples a meaningful common 

ariable in therapist verbal behavior, and hence 
justified a powerful and elaborate reliability 
study. 

Study 5 


(Finesinger & 
et al., 1954, pp. 


shows 


Scale 
more 


This was under 
to assess the range of 
when professional but 
Activity Scales 
terial; and (b \ the 


taken (a) systematically 
reliability estimates obtained 
untrained raters apply the 
printed intervie’ ma- 
empirical equivalence 
(i.e., the interchangeability) of the two parallel Ac 
tivity Scales (see and B, Table 3) 

Eight raters consisting of four clinical psycholo- 
gists at the PhD level and four psychiatrists having 
between 2 and 4 years of experience were used in a 
modified adapted from Cutler, 
Bordin, (1958). For each of 
successive therapist 
seriatim for Activity Level, using either 
A or Scale B. Each scale was used with a dif- 
ferent pair of the four interviews presented to each 
subject. In order to control for the possibility that 
ratings of the therapist responses might be influenced 
by the succeeding response from the patient, only 
the therapist responses were presented for two of 
the interviews rated by each subject (the “Context 
Absent” condition) ; whereas for the other two inter- 


Scales A 


latin square stud) 
Williams, and Rigler 
four interviews the subject rated 
responses 


Scale 


TABLE 5 


EXPERIMENTAL DeEsIGN oF Stupy 5 


Scale A Scale B 


yntext 


Rater Absent® Present® 


Context Context 
Absent Present 
PhD ( :3 IT: 
MD I: ] 3 II: 
PhD : fe & 
MD : ; a 
PhD : : IV: 
MD : IV: 
PhD : ITl: 
MD : 1 ITI: 


ue wh 
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= 


ee WH WD 
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“Context Absent” ir ! the patient's responses were 
not presented to the subject mtext Present’’ implies that 
they were so presented 

Roman numer ew (see text). 

¢ Arabic nu als refer to the o r of presentz ation to the 
subject, of a specifi i 
bination. 


als refer to art liar intery 


rticular treatment com 
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Context v iable, 

tematically varied design sho 
in Table 5. It will be noted that four pairs of sul 
jects (one psychiatrist and o1 ini psyct 


} 
chologi 


were treated identically with or f four sets 


treatment combination 


The four interviews were s« 1 from those us¢ 


o as Inter 
views I (Wolberg) consisting of 73 therapist re 
sponses; 2 II (Gill), 73 responses; III (Finesinger 


in Study 4. They are hereaft 


re 


It was necessary to ignore certain types of 


pist responses (two from each inter\ 
initial greeting or a farewell 
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rABLE 9 


INTERRATER RELIABILITIES AS A 


Combination of 
experimental 
Median* 


conditions 


Same scale, 


same context 


Different scalk 
different context 
Same context, 
different scale 
Same scale, 


aiufterent conte 


Note.—"'s 


present, or with p 
itext present, the other 
with Scale A, or with Scale B; ‘ 
For values of N, 
* The median r i 
1 ea 1 of the other three « 


in va 


ian), 62 responses; and IV (Rogers), 48 
In order to keep the subject’s task within a reason- 
able time limit, only the first 73 responses of inter- 
views I and II were used, while the other two in- 
terviews were used in their entirety. The total time 
taken for the four tasks varied from 1 to 
per subject. 

Results of Study 5. The analysis of variance is 
shown in Table 6. The most clear-cut and crucial 
fact established by the analysis is that the mean Ac- 
tivity Levels of the four interviews significantly 
differ among themselves (p< .001). These mean 
values are presented in Table 7. The rank order of 
the four interviews accords exactly with that found 
in Study 4. It is noteworthy that the Rogerian in- 
terview (IV) turns out to be the most active and 
the Finesingerian one the least active. The sole other 
significant effect is for the Scale variable (p < .05) 
This accords with an a priori hunch, but it is never- 
theless potentially somewhat disturbing; for it will 
furthermore be seen later that for three of the inter- 
views, Scale A manifests greater reliability than does 
Scale B. While the context variable turns out not to 
be significant (which is as it should be), it should 
be noted, for the present, that this finding refers only 
to overall mean values for each interview. 

The between-rater reliabilities for each 
were computed by IBM, yielding a total of 4(8 x 7) 
2 = 112 reliability coefficients (Pearson r’s). One sum- 
mary of these, presented in Table 8, breaks the data 
down into a PhD group (clinical psychologists) and 


response Be 


hours 


interview 


3 Michael S. Black, now of the University of Illi- 
nois, performed most of the tedious office computa- 
tions in Study 5. 


Func TION OF 


me 
rember 


EXPERIMENTAL ( 


Interview number 


cale"’ implies that 
rated with Scale A, t 
cance, see the genera 


3 


same scale and sa 


liffere 


an MD group (psychiatrists). The PhD group shows 
nonsignificantly greater median 
ment * than does the MD gr 
II. The same two i 


within-group agree- 
ip on Interviews I and 


inter were rated more reli- 
ably than Interviews III and IV, by both the PhD 
group and the “all subjects” group (p < .01). That 
the Rogers interview (IV) elicited low rater agree- 
ment was not at all surprising, since many Rogerian 
responses do seem extremely difficult to match with 
items, and considerable argument was voiced 
by several subjects that their ratings of this inter- 
view were subjectively most unreliable. The equally 
low reliability of the Finesingerian (III), 
however, was rather surprising, and no satisfactory 
explanation of this finding is forthcoming. 

Table 9 presents the range of inter 
rater r’s as a various experimental 
Generally speaking, the highest reliability 
coefficients are obtained when comparisons are made 
with the same scale, and lowest when they are made 
with ratings based upon different scales; but the dif- 
ferences do not achieve conventional significance. The 
reliabilities of both Interviews I and II are consist- 
ently arithmetically larger than those of interviews 
III and IV within all four experimental treatment 
combinations. Only 1 of the 16 possible compari 
sons yields a significant value of p Table 9 
level of significance footnote), while the median value 
of p for the set is about .13. The effect of the Context 
variable is slight when assessed from the data pre 
sented in Table 9. But a somewhat different picture 


1ews 


scale 


interview 


median and 
function of the 
conditions. 


(see 


# All comparisons of r’s subsequently reported in 
this paper were made, unless otherwise stated, with 
the Median test (e.g., Siegel, 1956, 111-116). 
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where the 
raters 


is presented in Table 1 
examined within 
MD) 
identically 


reliabilities are 
PhD and one 
each of the two members having treated 
For Interview I the Context variable 
produces fairly consistent correlations within pairs of 
subjects using either Scale A or Scale B. For Inter- 
view II, however, the correlation is greater for Scale 
B condition when the patient context is present than 
when it is absent (p < .05). For Interview III the 
latter finding holds for both scale but does not 
achieve conventional significance 

A surprising finding for Interview IV is that when 
the patient context is present the 
to near zero for Scale A (p > .05) and by 50% for 
Scale B (p = .06)! This type of finding was reported 
also by Cutler, Bordin, Williams, and Rigler (1958) 
whose analyst-fledgling subjects agreed significantly 
less in ratings of Depth of Interpretation when they 
had patient material available to them. In the pres- 
ent study this finding is taken to reflect (again) th 
difficulties involved in rating the 

4 comparison of all r’s 
complementary r’s involving B (Table 10) 
that for Interviews I and II, consistently 
greater within-pair agreement (p < .02) 
with Scale A. The same comparisons for Interview 
III fall short of significance, although they too are 
in a consistent direction. For Interview IV (Rogers), 
on the other hand, exactly the opposite outcomes are 
observed, of which one is significant (p < .05) 
the other nearly so (p 6) 


pairs ol (om 


been 


re lia bilitic s drop, 


material 
involving Scale A with 


Rogerian 
Scale 
shows 


is obtained 


and 


Disc USSION 


The empirically derived Activity Scales 
facilitate ratings having average reliability 
which is moderate (.51) for untrained raters 
(Study 5) and very high (.91) for well- 
trained raters (Study 4). The Activity Scales 
satisfactorily discriminate among the inter- 
views employed. In Study 5, however, dif- 
ferences among the reliabilities of the four 
interviews are considerable; the values for In- 
terviews III (Finesingerian) and IV (Rogers) 
being considerably lower than the estimates 
for the other two. The Rogers interview in 
addition leads to two unexpected discrepancies 
requiring discussion. 

A problem facing all of those who experi- 
ment with ratings of therapist verbal behav- 
ior concerns the selection of subjects. The 
natural, defensible tendency is to obtain the 
services of highly trained “experts.” This, of 
course, raises serious questions of practical 
availability, since the expert not only has less 
thi » donate to research workers, but he is 
also in much shorter supply than the non- 
expert. While in Study 5 subjects with thé 


ictivity Level 


PABLE 10 


witHin Eacn PA 
5s OF IDENTICAI 
[TREATMENT 


Scale A 


Patient Context 


values of N, and for r 
ice, see the general footnote t 
Values for Scale A are larger (p < 
B within Interviews I and II. Values 
06) than those for Scale A in Inter 
ew II, Scale B, the r for Patient Context Pre 
it for Patient Context Absent (p < .05 
r for Patient Context Absent, Scale B 
I “nt Context Present p 06 I 
> were obtained |} t et 
1 148). 


the 
t 


PhD may have shown a slight, but inconsist- 
ent edge over those with the MD, the overall 
results indicate that the reliability of perform- 
ance is very much more a function of experi- 
mental conditions than of professional spe- 
cialty. A conclusion comparable in principle 
was reached by Cutler et al. (1958). 

The two Activity Scales not only led to 
different overall mean ratings of Activity 
Level; interrater reliabilities also differed as 
a function of the particular scale. Scale A was 
more reliably employed for three of the inter- 
views, Scale B being more rei:ably used with 
Interview IV (Rogers). This is somewhat 
alarming, because the selection of particular 
illustrative points along the empirical scale 


dimension was in the present case (and pre- 
sumably was in several other reported stud- 


ies—e.g., Harway, Dittman, Raush, Bordin, 
& Rigler, 1955) largely an arbitrary matter. 
The empirical differences between the scales 
thus raise an important theoretical 
which now deserves comment. 

On the one hand it is quite possible that 
the two scales have different dimensionalities, 
Scale B being, say, two-dimensional, and 
Scale A one-dimensional. (One-dimensionality 
of the Activity continuum has heretofore been 


issue 
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assumed, but not empirically proven.) On the 
other hand, it is also possible that the (Ro 
gerian) responses in Interview IV are in toto 
two-dimensional, while those of the other 
three interviews can be adequately repre- 
sented with a single dimension. Granted these 
two contingencies then it would follow that, 
with Interview IV, Scale B_ would elicit 
greater interrater reliability than Scale A. It 
is furthermore suggested that Interpretation 
may constitute this second dimension among 
both the items of Scale B and the therapist 
responses in Interview IV. 

The plausibility of the foregoing hypotheti- 
cal argument may be clearer in the light of 
the following considerations. The “reflection,” 
which is the basic and most frequent verbal 
operation in the Rogerian interview (Rogers, 
1951) presumably takes one some distance 
along a dimension of interpretation. In con- 
trast, the other three interviews used in 
Study 5 between them contain less than a 
half-dozen responses that could be classified 
as “interpretive.” Furthermore, inspection of 
Table 3 shows that Item 8 in Scale B con- 
tains the sole reference (in either scale) to 
“reflection of feelings.” It is likely that sub- 
jects employed this category to classify those 
responses in Interview IV which were typi- 
cally Rogerian in nature,° whereas in Scale A 
no comparable item lay at the subject’s dis- 
posal.* Consequently, the reliability of Inter- 
view IV would turn out, as suggested above, 
to be higher with Scale B than with Scale A. 

When one speaks of Scale B as facilitating 
“higher reliability” of ratings for Interview IV 
it must be noted, however, that ratings of 
therapist responses in this interview mark- 
edly drop under both scale conditions when 
patient context is added. This finding is quite 


5 At least 5 of the subjects were clearly aware that 
Interview IV was Rogerian. 

6 A rough check bearing out the tenability of this 
hypothesis is as follows. A frequency count across 
all eight raters was made of the frequencies with 
which, for each interview, the eighth item in Scale A 
and Scale B were employed. For Interview I, the 
respective proportions of responses classified in Item 
8 were .15 for Scale A, and .02 for Scale B. For 
Interview II the respective proportions were .02 and 
04; and for Interview III, .00 and .10. For the 
Rogerian interview (IV), however, the proportions 
were .30 for Scale A, and .53 for Scale B. 
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opposite to that for Interviews I 
and 


, I, and IT; 
accordance with 
logical expectation. For, depending upon the 
nature of some given “exploratory question,” 
say, from the therapist, there should be pre- 
dictable effects upon rating-reliability, of the 
addition of patient context. If the referents 
of the therapist question are absolutely clear 
to the rater 


furthermore is not in 


(i.e., if the referents are com- 
pletely defined by the question per se) then 
addition of patient context should not affect 
the reliability of rated Activity Level. If, 
however, the referents of the therapist ques- 
tion are not entirely clear to the rater, then 
addition of patient material should raise (but 
never lower) the reliability of rated Activity 
Level for the particular question. 

The foregoing suggests that the very pres- 
ence of patient context during rating of Ro- 
gerian responses in Interview IV somehow 
undermined the subject’s understanding of 
what a Rogerian reflection of feeling looks 
like. Indeed, at least in the particular inter- 
view studied here, the reflection frequently 
does not seem, subjectively, to bear any con- 
sistent contextual relation to whatever fol- 
lows from the patient. 

From the standpoint of theory and research 
it is desirable to examine in more detail this 
question of dimensionality with respect to 
both Scale B and the Rogerian therapist be- 
havior of Interview IV, in hope that a modi- 
fied scale might be assembled having high 
reliability with both non-Rogerian and Ro- 
gerian material. But this whole issue is of 
course an applied offshoot of the more gen- 
eral and fundamental question of the dimen- 
sional relations between elicitation of infor- 
mation, and interpretation of information 
(which according to the results of Studies 1 
and 2 involves relatively high Activity Level). 
A relevant study along the lines of one by 
Raush, Sperber, Rigler, Williams, Harway, 
Bordin, Dittmann, and Hays (1956) is to be 
performed in the near future. 

In summary, it is felt that, subject to the 
restrictions outlined above which demand fur- 
ther clarification, we have examined some of 
the critical variables affecting the reliability 
of ratings of therapist Activity Level; and 
that the scales themselves are sufficiently 
meaningful and reliable to justify their ap- 
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plication in further research on the initial as 
well as the therapeutic interview. Attention 
may now be turned toward specification and 
examination of relevant variables in the pa- 
tient’s behavior as a function of therapist 
Activity Level. 


SUMMARY 


This paper describes the development of a 
parallel pair of scales for assessing the Ac- 
tivity Level of discrete Therapist Verbal Re- 
sponses, and the application of the scales to 
several published initial interviews. In Study 
1, 30 Board-certified psychiatrists rated 50 
abstract descriptions of Therapist Verbal Re- 
sponses along an 11-point scale of “Activity,” 
the latter being defined in terms of the de- 
gree of “Ambiguity, Lead, and Inference.” 
Interjudge reliability was .50, and the intra- 
class r, .93. Each rater also categorized the 
50 responses according to whether he con- 
sidered them primarily used for purposes of 
treatment, or for purposes of diagnosis. Those 
therapist responses agreed to be primarily 
“therapeutic” in purpose were rated with a 
considerably higher mean Activity Level than 
others classified as “diagnostic” in purpose. 

In Study 2 it was shown that a large ma- 
jority of the 50 therapist responses was agreed 
by independent judges to typify one of the 
following conventional categories of therapist 
operation: Simple Facilitation, Exploration, 
Clarificaton, Interpretation, and Supportive 
Reassurance. The responses classified in these 
successive categories, respectively, showed in- 
creasingly higher mean Activity Levels. Con- 
sequently, it was assumed that the main set 
of 50 responses included representative ele- 
ments from the entire range of typical thera- 
pist operations. 

Study 3 involved further rating of a re- 
vised set of 36 Therapist Verbal Responses 
belonging mainly in the categories of Simple 
Facilitation, Exploration, and Clarification. 
The subjects consisted of 15 of the psychia- 
trists used in Study 1, and 37 freshmen medi- 
cal students, 19 with high, and 18 with low 
interest in psychiatry. Reliability of ratings 
was only slightly lower for student subjects 
than for psychiatrist subjects; and students 
with low interest in psychiatry showed least 
(though still highly significant) agreement 
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with the psychiatrists’ 
therapist 


rank ordering of the 
Since the rating pro- 
cedure did not appear to be a serious func- 
tion of either professional interest or experi- 
ence, a parallel pair of 10-point scales of Ac- 
tivity Level were assembled using data from 
the psychiatrist subjects. 

In Study 4 the individual therapist re- 
sponses of five unfamiliar published initial 
interviews were rated by both authors, using 
Scale A. Interjudge correlation was .90 or 
better. A more elaborate and rigorous reli- 
ability study was then performed. 

In Study 5 the two Activity Scales were 
then employed in a latin square design re- 
quiring eight untrained raters (four psychia- 
trists and four clinical psychologists) to rate 
for Activity Level the therapist responses in 
four widely differing published initial inter- 
views (by Wolberg, by Gill, by a Finesinger- 
ian, and by Rogers). Scale A vs. Scale B con- 
stituted one factorial variable, and Patient 
Context Absent vs. Present constituted the 
other. The analysis of variance showed a sig- 
nificant difference among the interviews, and 
a significant main effect for the Scale vari- 
able. When interjudge reliabilities were ex- 
amined the two types of subjects (psychia- 
trists and psychologists) showed only minor 
differences. Further, the Wolberg and the Gill 
interviews were consistently more reliably 
rated than were the other two. Scale A, how- 
ever, was consistently more reliably employed 
than Scale B with three of the 
views, 


responses. 


four inter- 


but Scale B was more reliably em- 
ployed with the fourth (Rogerian) interview. 


Furthermore, while adding Patient Context 
either increased or did not affect reliability 
of rating (with either scale) of the first three 
interviews, the reliability of rating the Ro- 
gerian interview clearly decreased. 

The discrepancies involving the Rogerian 
interview were discussed, and a hypothetical 
basis for their occurrence was advanced which 
concerned the dimensionalities of the two 
scales and of Rogerian vs. non-Rogerian 
therapist responses. It is concluded that while 
the general problem of dimensionality needs 
further examination, we have a pair of 
parallel Activity Scales the reliabilities of 
which are comparatively satisfactory (the 
grand miedian of 112 coefficients is .50), and 
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that we have explored some of the conditions 
likely to affect their application by untrained, 
professional raters. One may now turn to- 
ward investigation of patient variables as a 
function of Therapist Activity Level. 
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THE ACCURACY OF CLINICAL PSYCHOLOGISTS’ 


ESTIMATES 


OF INTERVIEWEES’ INTELLIGENCE’ 


ZANWIL SPERBER anp 
Hospit 


Children’ 


That accurate appraisal of intellectual ca- 
pacity can best be accomplished using stand- 
ardized test procec ures is well accepted. There 
are often however, when clinical 
decisions are influenced by judgments of in- 
telligence which must be based on observa- 
tions rather than tests. It is therefore impor- 
tant to ask, “How good are clinicians’ esti- 


occasions, 


mates of intelligence?” The purpose of this 
paper is to present data indicating the rela- 
tionship between clinical psychologists’ 
mates of 


esti- 
intelligence, based on observations 
of only the verbal behavior of interviewees, 
and psychometric measures of the interviwees’ 
intelligence. 


METHOD 
Sub jec Five clinical 
Four PhDs. All had substantial experi 
ence with intelligence testing of adults and children 
and were familiar with current approaches to the 
conceptualization and measurement of intelligence 
The women whose IQs 
mated had been interviewed as part of a 


served as 


judges were 


Interviewees were esti 
follow-up 


who had 


study of their children, 4 years of agi 

1 The interviews and intelligence test data used in 
the present research were collected as part of a study 
of children who had had blood problems 
in most 


as neonates 
involving Rh incompatibility and 
exchange transfusions. Th 
study was supported by the National Institute of 
Neurological Diseases and Blindness, National Insti 
tutes of Health, United States Public Health Service 
as part of the Collaborative Project to Study the 
Etiology of Cerebral Palsy and Other Neurological 
Diseases of Infancy and Childhood 

T. McNair Scott is Senior Inv 
Collaborative Project at Children’s 
T. R. Boggs, Jr., pediatrician; C. Kennedy, neu 
rologist; and J. A. Rose, psychiatrist, are 
vestigators for the Collaborative 
follow-up study. 


cases 


treated by follow-up 


stigator for the 
Hospital, and 


co-in 
-roject and th 


We appreciate the contribution made by our ps\ 
chologist colleagues who served as judges, and Eliza 
beth Hirshman’s assistance with the 
putation 


tatistical con 
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al of Philadelphia, Pennsylvania 


been treated medically at birt . tnote 1). The 
sample size was set at J required 
for data collection given thi I ved =the 
volunteer judges could be 
ilso felt this NV 


ful statistical results 


limit 
pul We 
was large meaning 
The 30 cases were drawn using ratified randon 
procedure to be representati and meat 
of the intelligence test scores « the largs follow Ip 
group. A 


occupations 


ocial class categorization of husbands 
& Mandler, 1952; Sperber 
1959) shows that the sample included members of 
middle, middle, skilled, and _ unskill 

worker classes. Table 1 shows th age educatior 
ind measured IQ of the 

Interviews. The 
ducted by psychiatrists and 
developmental progress, and on 


(Sarason 
lower 


sample 
hour-long interviews weré yn 
child’s 
maternal attitudes 
Attempts were made to elicit some description of th 


fox uS€ d on the 


mother’s life history The interviews wert tape 
corded and verbatim transcripts typed. Judges 1 
ind 3 made intelligence estimates after 
transcripts, Judges 4 and 5 after 
cordings. The judges had no other 
mothers 
Inte An abbreviated WAIS (Wect 

ler, 1955) consisting of four subtests, Vocabulary. Ir 
formation, Block Design, and Picture Arrangement 
was routinely administered to the mothers. The four 
subtests give IO ob 


Doppe It, 


reading th 
hearing tap 
contact with tl 


rence criteria 


a good approximation to the 

using the full scale (Cohen, 1957 
1956; Himelstein, 1957). Hereafter 
based on scores on the 


tained 


the prorated IQ 


four subtests will 
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rHE IN? 


ber of cases with rele 
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TABLE 2 


CORRELATIONS BETWEEN JupGES’ IQ EstIMATES 


AND OTHER CRITERIA 


Judge 


IQ criteria 


Wais 
Vocabulary 
“Other Judges’’* 


*» Entries are mean correlations between indicated judge's 
estimates and those of the other four judges 
* < 05: all other correlations are statistically significant at 


the p 01 level. 


the WAIS IQ. WAIS IQs were available for 29 
women. In the: last phase of the follow-up study 
only the Vocabulary subtest was given. One of the 
mothers inadvertently included in our sample was 
from the later group 

Since verbal production served as the judges’ pri- 
mary source of information about the interviewees’ 
intelligence, a prorated IQ based only on the inter- 
viewees’ scores on the WAIS Vocabulary test was 
used as a criterion. Hereafter the prorated 
IQ based on the Vocabulary subtest will be called 
the Vocabulary IQ. 

Procedure. Judges were asked to assess the inter- 
viewees’ IQs with no further discussion of how they 
should define intelligence or use the interview ma- 
terial. They made their judgments independently, 
specifying a exact number for estimates between 70 
and 140, and indicating after each estimate whether 
the judgment had been made with high or low con- 
fidence. 


second 


Judges were aware of the general nature of the 
follow-up study and knew the mothers had taken an 
abbreviated WAIS. For each case they were told the 
age of the child who was the subject of the inter- 
view. Cases were prearranged to form five random 
sequences of IQs. Each judge followed a different 
sequence in making his estimates. 


RESULTS 


Correlational analyses. Product-moment cor- 
relations were calculated between each judge’s 
estimates and (a) the WAIS IQ, (6) the Vo- 
cabulary IQ, (c) each other judge’s IQ esti- 
mates. The 10 coefficients between estimated 
and measured IQ, presented in Table 2, are 
positive and, with one exception, substantial. 
The estimates of all possible pairs of judges 
were positively correlated at the .01 level. 
Table 2 shows the mean correlations between 
each judge’s estimates and those of the other 
judges. Presence of voice cues did not influ- 
ence the correlations between judges’ esti- 
mates, or between estimates and criteria. 


Discrepancies between measured IQ and 
estimated 10. The mean IQ assigned by each 
of the five judges ranged from 100.5 to 105.3, 
corresponding closely to the test means shown 
in Table 1. The SDs of judges’ estimates were 
somewhat smaller than the SDs of test scores. 
Four judges restricted their estimates to an 
80-126 IQ range. 

As indicated in Table 3 judges’ estimates 
often deviated appreciably from the measured 
IQs. Over all five judges the mean discrep- 
ancy between the estimated IQs and WAIS 
IQs was 7.8 points. The mean discrepancy 
from the Vocabulary IQ was 9.9 points. Con- 
sidering only the WAIS criterion in relation 
to which judges’ estimates were more accu- 
rate, 83% of the estimates made by the most 
accurate judge, and 66% made by the least 
accurate judge, were within 10 IQ points of 
the criterion. Over all five judges, 72% of 
the estimates were within this range. 


Table 3 presents the result of two addi- 
tional analyses. Knowing that the WAIS was 
standardized so that the mean IQ score of a 


sample representative of the “population at 
large” would be 100 (Wechsler, 1955) how 
accurate would a clinician be if he simply 
“programed” himself to estimate each inter- 


DISCREPANCIES BETWEEN ESTIMATE 
AND MEASURED IQ 


Assumed population 


mean (IQ = 100) WAIS 


0-35 
0-49 
WAIS IQ \ 


Vocabulary IQ 6.3 0-21 


*N = 29 for WAIS 
criterion. 


for Vocabulary 





Estimates of Interviewees’ Intelligence 


As shown in Table 3, 
the mean, SD, and range of discrepancies ob- 
tained by a hypothetical programed judge 
would have been larger than those of of our 
judges. Table 3 also shows the mean, SD, and 
range of discrepancies between the _inter- 
viewees’ WAIS IQs and their Vocabulary 
IQs. Despite test overlap, the discrepancies 
between the two psychometrically derived 
IQs are not appreciably less than those ob- 
served between judges’ estimates and the 
WAIS criterion. 

Judges’ confidence and their IQ estimates. 
The five judges differed markedly with re- 
spect to the confidence with which they made 
the IQ estimates. Judge 1 was highly confi- 
dent on only four estimates, Judges 4, 5, and 
3 on 13, 15, and 16 estimates, respectively, 
while Judge 2 was highly confident on 23 
estimates. This degree of variability suggests 
that the source of confidence was unique to 
the judge and not a function of some ob- 
servable attribute of the interview material. 
To test this supposition judges were paired 
in all 10 possible combinations. The percent- 
age of cases where both judges agreed in feel- 
ing either high or low confidence was com- 
pared to the percentage of agreements ex- 
Agreement ranged from 
30°”. to 53%, falling below chance expectancy 
for five pairs of judges. 

The degree to which judges tended to feel 
highly confident after making IQ estimates 
was examined in relation to their perform- 
ance. The judges were ranked for confidence 
level, assigning Rank 1 to the judge with the 
largest number of high confidence estimates. 
The criterion for a judge’s performance as an 
IQ estimator was the average discrepancy be- 
tween his IQ estimates and the WAIS IQs, 
the judge with the smallest average discrep- 
ancy being assigned Rank 1. The obtained 
rank-order correlation was .90, significant at 
p < .05 (Senders, 1958, p. 545, Table M). 

Within judges there was a slight but con- 
sistent reversal in the relationship between 
confidence and accuracy. Biserial correlations 
between the judge’s confidence (high or low) 
and size of the discrepancies between his esti- 
mates and the WAIS IQs ranged from .03 to 


viewee’s IQ as 100? * 


pected by chance. 


colleague, Edna Small, 


2We wish to thank our 
who suggested this analysis 


18, indicating that judges tended to be more 
accurate on those estimates they made with 
less confidence. 


DISCUSSION 

The results are consistent with the findings 
if other investigators who have reported cor- 
relations between intelligence estimates made 
without benefit of psychometric data and in- 
telligence test scores (Hanna, 1950; Marsh & 
Perrin, 1925; Wilson, 1954). Substantial dis- 
crepancies between measured and estimated 
IQs did occur in individual cases, but 72% 
of the judges’ estimates were within 10 points 
of the WAIS criterion, and those of the more 
accurate judges did not deviate from the 
WAIS IQ any more than did a measured 
IQ based on the WAIS Vocabulary score. 

Judges apparently are realistic in deciding 
how much confidence, in general, to place in 
their IQ estimates. There was a direct rela- 
tionship between the number of judgments 
made with high confidence and the accuracy 
of judges. Written comments volunteered by 
three judges suggest that the contrary tend- 
ency for all judges to be a little more accu- 
rate on low confidence judgments compared 
to their own high confidence judgments was 
a function of their considering additional as- 


pects of the interviews on cases perceived as 
difficult to evaluate. 


Some of the larger discrepancies in judg- 
ment occurred because judges overestimated 
the 1Q of the cases of low intelligence and 
underestimated the high extremes. A trained 
observer’s estimate is, therefore, not to be 
considered a substitute for a good intelligence 
test where precise data are required, although 
it may be sufficient when a general idea of 
the client’s intellectual capacity is all that is 
needed. Within these limits the present study 
indicates that experienced psychologists can 
make clinically useful estimates of inter- 
viewees’ intelligence. The findings should not 
be generalized to teachers, parents, physi- 
cians, or other judge groups without further 
research. 


SUMMARY 


Five clinical psychologists estimated the 
IQs of 30 mothers who had been interviewed 
by psychiatrists, three judges after reading 





524 Zanwil Sperber and 
transcripts and two after hearing tape re- 
cordings of the interviews. IQ estimates were 
compared with the prorated IQ based on 
WAIS subtests. 

Correlations between estimates and the 
WAIS criterion were significant (mean r 

.70), with 72% of the estimates within 10 
points of the criterion. More confident judges 
were more accurate in their estimates. 

The results indicate that experienced clini- 
cal psychologists can make useful estimates 
of interviewees’ intelligence. 
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STIMULUS GENERALIZATION IN BRAIN 
DAMAGED CHILDREN’ 


SARNOFF A. MEDNICK CYNTHIA WILD 


University of Michigan Yale University 


Stimulus generalization (SG) can be said directly in front of, and 3.5 feet away from, thi 
to have occurred when a response previously ~oaieiy oe — = my “ aS Num 
trained to be elicited by stimulus O can also 5c lee ee Ayan te pry Ney: thee 
be elicited by test stimuli similar to O. This Lamp 6 being the center lamp. (Th 
phenomenon has been used extensively in ex- this study were 1, 3, 5, 6, 7, 

planation of verbal learning (Gibson, 1940), ‘flashlight lamp, 2 inches abo 

social ‘activity (Hull, 1950), and clinical be- “**Ved 4S @ fixation point and 

havior (Mednick, 1958b). A study by Med- © oy on ee SS 

nick has suggested that the | 

of the brain damaged adult usually described Subjects. Thirty-six childret 


: sige experimenter was seated behind tl 
vehavioral deficit ibject’s view 


by the term “concrete” may also be under- tudy. Eighteen were patients at th 


. - ro lir the » ldren’ jr ; 
stood in terms of SG. He found that the SG Clinic of the Children’s Hospital in Bo 
jority of them were diagnosed as spastic, | 


responsiveness of these patients was sharply : , ar _ 

3 : nes . ithetoid cases were also included. TI 
curtailed (Mednick, 1955). Research has joys and 8 girls. ranging in age fron 
suggested that the concreteness observed in rs, and in intelligence from Mental 
the brain damaged adult has its counterpart iperior. The 18 children in 


were 


+} 


in the child. For example, Cotton was par- 
ticularly struck by the similarity between her 
group of children suffering from cerebral damage, were matched individual for individual with 
palsy and the brain damaged adult (Cotton, respect to IQ and age with the 5 cerebral palsied 
1941). In view of these findings, it seemed children with subnormal IQs; the remaining 13 sul 
© : 4 s »f yr j ellige r ir the Oo rol rr 7) 
advisable to compare the generalization reac- pHs rmal intelligence in the Control grouy 
: : : vere taken from a previous study Mednick & 
tiveness of brain damaged and intact children ppesae = apron 
: : Lehtinen, 195 None of the CP children used in 
In terms of the previously obtained results the study had a known visual defect. They all ha 
with adults, it was hypothesized that brain full use of at least one ar 
damaged children would demonstrate less SG Procedure. Subjects wet to lift their hand 
than intact children. from the reaction key as quickly as possible wher 
the center lamp was lit. They were told that oth 


group tor ag 


children, with no evic f organic brair 


— lamps would be lit occasionally, but that they wer 
I oD 


only to respond to the center lamp. Subjects wer¢ 


tu 
u 


{pparatus. The apy lapted from on encouraged to respond as quickly as possible. The 


Ta i 
t 
devised by Brown, Bilodeau, and Baron (1951). It atency of response was measured to the nearest 


Ss was 
consisted of a horizontal row of 11 lamps fastened one-hundredth of a second with a Standard Electric 
to a flat black, curved plywood panel 6 feet by 2 Timer. Two criteria wert cided upon to « 
feet, mounted on its long edge on The lamps mine whether the subject was capable of performing 
were spaced 9 degrees apart an equidistant the task. First, the experimenter went 
from the subject’s eyes when the subject was seated instructions with the subject as many 
necessary for him to be able to repea 

1 The authors wish to express their appreciation ectly. Somewhat more explanation usuall: prove 
for the help and advice of Edith M. Taylor of Chil necessary for the CP child than for the intact child 
dren’s Hospital, Boston, Massachusetts, and Chipman Secondly, a behavioral test of tl i} it) 

the Fernald School, Waltham, Massachusetts. The to understand and perform the task was also em 
work was partially supported | i United State ployed fter the instructions, the subject receive 
Public Health Service Grant “No. M 1519 the two demonstration-test trials. If the subject re 
senior author while he was at Harvard University sponded inappropriately, he was discarded. No in- 





Sarnoff A. 


Mednick 


and Cynthia Wild 


rABLE 1 


THAT 
RESPONSES 


SUBJECTS 


OF AT 


tact child: discarded ; CP children 
discarded. 

Ten consecutive 
lamp (10-15 seconds intertrial intervals) 
given. The training trials were followed without 
warning by a test series during which six of the pe 
ripheral lamps (Lamps 1, 3, 5, 7, 9, 11) 
sented twice each, interspersed with 17 
trials with the center lamp in a counterbalanced or 


were 


with 


the 


were 


training trials 


were prt 


“booster 


der. The total number of trials in the test series was 
29, 17 with tl lamp and 12 the pe- 
ripheral lamps. Zero, one, two, or three center lamp 
trials intervened between successive test 
trials with the peripheral lamps. Six different or- 
ders were used for the test trials, each order begin- 
ning with a different peripheral lamp. Three sub- 
jects were assigned to each order from the CP and 
Control groups. Approximately 50% 


re cent with 


booster 


verbal rein- 


TABLE 2 


DISTRIBUTION Comp 


STIMULUS GENER ZATION 


EQUENCY 


RESPONDED 


AT 
EAcH 


lorcement subject 


task and promot ptimal 


concentrated 


on the 


reaction times 


In previous research using voluntary re- 
sponse measures of SG responsiveness, no re- 
lationship has been found to exist between 
latency and frequency measures of SG (Gib- 
son, 1939; Mednick, 1955; Mednick & Freed- 
man, 1960; Rosenbaum, 1953) except under 
special conditions (Mednick, 1958a). These 
results were also observed in this experiment. 
The two groups did not differ significantly in 
mean on the training 
trials the CP group 
was for the Control 
yr was there a relationship 
and frequen¢ y of response 
les were dichotomized and 
subjected to chi square anlaysis. 


e 


le eX f 
latency o! 


(the 


response 
mean latency for 
the mean lat 
334) ne 
between latency 

when these varial 


mK be ency 


group was 


The frequency generalization data are pre- 
sented in Table 1 in the 
1 


tion of subjects in each 
least once to a given 


form of the propor- 
group responding at 
lamp. As can be seen, 
the CP group showed less SG responsiveness 
than the Control group at every lamp. This 
is also reflected in a count of the total num- 
ber of responses made at each lamp by the 
two groups (also in Table 1). 

The first SG test trial is considered an im- 
portant indicator of SG responsiveness, since 
it is relatively untainted by the effects of 
ction. On this test 
trial 14 of the 18 Control subjects responded, 


discrimination and extir 





Stimulus Generalization in 


while only 7 of the 18 CP subjects responded. 
This difference is significant (chi square, cor- 
rected = 4.11, df = 1, p< .05). 

Table 2 presents a frequency distribution 
comparing the SG responsiveness of the CP 
and Control children. While none of the CP 
children gave more than four SG responses, 
11 or 61% of the Controls showed five or 
more responses. The group differences are 
significant (chi square = 15.84, df=2, p 
< .01). This test was performed by combin- 
ing Rows 0-2, Rows 3 and 4, and Rows 5-8, 
collapsing Table 2 into a 3 X 2 table. 


DISCUSSION 


The hypothesis that the brain damaged 
children would evidence a diminished degree 
of SG responsiveness is supported by the re- 
sults. It seems likely that this finding may 
help explain the behavior of the brain dam- 
aged child, which has been described as con- 
crete. An often-cited clinical example of con- 
crete behavior concerns the child who has 
been trained to complete a task seated in a 
certain way at a certain table. However, 
when his position is altered or table changed, 
he is no longer able to perform the task. 


Clearly, this could also be explained as an 
instance of failure of SG. The second stimu- 


lus situation differed from the first; 
not occur. 

This way of thinking of the problems of 
these children has certain advantages. For one 
thing, we can look at the teaching materials 
for these children in a more differentiated 
manner. If we want the child to respond with 
the same response to two different stimulus 
situations (grasping ‘“‘abstract” concept), we 
should eliminate all unessential differences in 
the stimuli, since these will hamper generali- 
zation. In addition, we have an experimental 
literature in SG (recently reviewed by Med- 
nick & Freedman, 1960), on which we can 
draw for suggestions or manners to augment 
SG responsiveness. Thus, it has been shown 
that greater SG responsiveness is manifested 
under higher drive levels (Brown, 1942; 
Mednick, 1957: Rosenbaum, 1953). In ad- 
dition, within limits, greater training in giv- 
ing a 


SG did 


stimulus will result 
in augmented SG responsiveness to similar 
stimuli (Margolius, 1955; Thompson, 1959). 


response to a 


Brain Damaged Children 


SUMMARY 


The hypothesis that brain damaged chil- 
dren suffer reduced SG responsiveness was 
tested and supported. SG was measured along 
a visual-spatial dimension with an apparatus 
that required a voluntary response. Some ob- 
servations were made regarding the implica- 
tions of this finding for the training of the 
brain damaged child. 
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THE EFFECT OF INSTRUCTIONAL TIME INTERVAL AND 


SOCIAI 
FORCED-CHOICE 


LAWRENCE P. BERGS 


University of 


If one conceptualizes anxiety as an emo- 
tional arousal state that varies from situation 
to situation and ever 
from time to time, it becomes important to 
take this variability into account in the con- 
struction of test de 
signed to measure the construct. Ordinarily the 
instructions accompanying most and 
pencil tests traits vague 
with respect to the time interval for which the 
person is rating himself, although the usual 
implication is that the person is answering the 


in the same situation 


any paper and pencil 
paper 


of personality are 


items in terms of how he has generally been 
during his life. 

The first aim of the present research was to 
administer the same anxiety scale to separate 
groups with given terms 
three different time intervals, and observe how 
the correlations of the scale with a subsequent 
criterion assessment of anxiety in a specific 
stress situation affected. The 
tervals selected “last 2 weeks, last 6 
months,” ” The criterion 
sessment in individual 
It was thought that the 
“last 2 weeks” and “in general” groups would 
show less correlation with the criterion meas- 
ure than the “last 6 months” “Last 2 
weeks” should be low because this time in- 
terval would be most affected by transient 
fluctuations, and “in general” should be low 
because it too time interval, 
whereas “last 6 months” might more likely 
tap the level of 
anxiety. 


instructions in of 


were time in- 
were 
and “in general 
was made 
about a month later 


as- 


sessions 


group 


covers long a 


more recent characteristi 

The second aim of the study was to vary 
the degree to which variance attributable to 
the tendency to respond in terms of the so- 
cial desirability of the 
the An attempt to remove this 


item was removed 


from scale 


DESIRABILITY ON THE VALIDITY OF A 


ANXIETY SCALE 


BARCLAY MARTIN 
Wiscor 


AND 


source of variance was made by applying dif 
forced-choice 
to that 


SCOI ing proc edure 


ferent scoring procedure 
of 
by Heineman 


to a 
lets similar 


format item trij used 


l 
(1953). One 


was thought to minimize social desirability 


variance, a second to be ] eavily affected by it 
and a third was devised to 
the 


was expected that the pr 


measure more di- 
itself 
that mini- 
ce would yield 
the 


rectly social desirability 


It 


mized social desirabilit rial 


variance 
cedure 
scores most with cri- 


highly correlated 
terion. 

It was also expecte 
itself 


fected by the three 


t social desirability 
lifferentially af- 


would iy ( 


variance, \ 
tructional time inter- 


vals. First. it is reasonable to suppose that a 


more willing to admit to so 


person would be 
cially undesirable attril 
that these were true 

for 6 months or it neral, 
the scores should de- 
crease with increasing time intervals. Secondly, 


ites if he were saying 
week period than 
and accordingly 
mean social desirability 
if one of the scoring procedures does indeed 

the social desir 
for 
a function 
rval than the score 
for the procedure that inc 
variance. 


successfully removi 

ability variance, the 
that procedure should vary 
of instructional time inte 


mean scores 


less as 
ludes more of this 
Previous research is meager with respect to 
showing differential validity as a function of 

ial desirability vari- 
ance is removed from paper and pencil anx- 
lety measures. Edwards (1957) has pointed 
up the pervasiveness of and 
easure it. Heineman 
1 the Taylor MA scale 


variance by cot 


the degree to which sox 


this variance 
developed a scale to 
(1953) attempted to ric 
of this 
choice version, and sho 
tion the MMPI 


structing a forced 
wed that 


K Sf ile 


the correla- 


with could be re- 





Instructions and Social Desirability 


duced. Silverman (1957) found that Heine- 
man’s forced-choice form correlated .24 (p 

.05) with base level palmar conductance 
obtained before a stress session, whereas the 
regular Taylor MA scale correlated only 
—,.02. Martin (1959) reported a correlation 
of .44 between base level palmar conductance 
during stress and a forced-choice scale com- 
posed of adjective triplets taken immediately 
after the stress session and a correlation of 
—.02 between the measure and the 
regular Taylor MA scale taken earlier in a 
group session. The correlation of .44 was ob- 
tained in a group that took the adjective 
triplets scale 


Same 


with instructions to answer in 
terms of how they had just been feeling dur- 
ing the stress session. Two other groups that 
were told to answer in terms of how they had 
feeling during the last month and in 
general, respectively, showed no significant 


been 
correlations with palmar conductance. 
PROCEDURE 


Construction of the Force d Choice Scale 


s( ale which 
1955) ar 


The 20 items from the Tavlor MA 
independent item analyses by Bu 

nd Magoon (1954), had been st 

between criterion groups, were used 
items in the present scale. These were 
items used by Bendig 
a short form of the Taylor MA 
other items were selected from the 
basis of a priori judgment 


é 
(1956) in the development of 


Twenty-eight 


MMPI on the 


not being di- 


scale 
as to their 
rectly related to anxiety and their involving person- 
fluc- 
changed, 
stated in the 
to make it appro- 
of a specific past 


ality characteristics that were subject to 
tuation. The ¢ 
where necessary, 
past perfect tense. This was dor 
priate to answer the 
time interval 


some 
wording of the items was 


so that all items were 
items in term 


social desirability 
an introductory psychology 
Forty triplets of 
format of 
item 


All 48 items were then rated for 
by 110 from 
on seven-point rating scales 
items were 
Heineman (1953) in 
paired with a nonanxiety 
sirabilitvy, and a third 1 
which differed by approximately two scale units in 
social desirability (either plus or from the 
first two items. Each anxiety item appeared twice in 
the 40 triplets 


students 
class 
then composed following the 


which an anxiety was 
item of equal social de- 
nonanxietvy item was added 


minus) 


The Scorine Procedures 


asked to 
most like 
Scoring 


In taking this 
ect the item in each triplet 
them and the item that was least lik 


inventory, subjects were 


that was 


them 


529 


Procedure A was rather 
an attempt to remove social desirability variance. In 


complicated and represent d 


brief, the scheme was as follows 


Anxiety item nonmatched item 
least: 

Anxiety 
matched item least 

Nonanxiety 
iety item not marked 

Nonanxiety nonmatched item mo 
iety item least: 

Nonanxiety 
item not 

Nonanxiety 


most, 


item most, nonanxiety 


nonmatched item mos 


1 point 
matched item most 
marked 


matched item 


1 point 


item least 0 point 


The behind this 
illustrated by examining the 


h is pe 

point and 0-point com 
In the former it can be 
that the 
If putting himself ir 


logic approas 


seen that the 


binations 
ject is saving matched nonar 
like 

i li 


lavorabDlie high 


least him 


item as leas 
} 


ing the anxiety 
tion of ] 


social ility 
anxiety 


desira 
item is marked most 
B consiste 
the anxiety item was kee 
ft unmarked, and 0 points 
This should be influences 

ocial desirability variance 
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Scoring Procedure 
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variance it- 

nature of the triplet 
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The variable 
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dimension 
was scored as follows, with a high 


the tendency to say unfavorable things about one 
self: 2 points if nonmatched 
marked least like; 1 point if left 


points if marked most like 


item wa 


unmarked; and 


nonanxiety 


Subjects and Grou» Testing 


Small groups of volunteer subjects from an intro 
ductory psychology course were seen until a total of 
40 male and 40 female subjects in each of the three 
instructional conditions had been administered the 
Forced-Choice Anxiety scale. The three instructiona 
condiiions were obtained by asking the subjects to 
they had been dur 
weeks. (h) the last 6 n 


answer the scale in terms of how 
the last 
(c) in general 


onths. or 


ing (a) 


The Individual Stress Session 


A random sample of 40 subjects 
I 


male) elected 


(20 male, ? fe 
from each of the larger 
tested samples, and contacted for the individual ses 
sion which occurred on the average about a mont! 


after the group session. A more complete description 


was 


rroup 
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TABLE |! 


CORRELATIONS AMONG THE THREE 
CRITERIA MEASURES 


Initial 
conductance Rating 
Measure 

Male Female Male Female 
Systolic change 31 
Initial conductance 


of the stress procedure may be obtained in Bergs 
(1960). Briefly, the subject was confronted in close 
proximity by two experimenters in a small room. The 
experimenters watched the subject closely throughout 
the session and rather obviously made notes and rat- 
ings. The subject was told at the beginning, 


In this experiment we are going to ask you to do 
several things. First, we will ask you to tell us 
what you see on a Rorschach test ink blot. For 
the second part we will ask you to tell us what- 
ever comes to your mind. We believe that your 
telling us everything that comes to your mind and 
your responses to this Rorschach card, together 
with this apparatus [points to GSR apparatus], 
will help us understand what your hidden feelings 
and emotions are, and tell us something about the 
kind of person you are. But first we want you to 
sit silently for another couple of minutes before 
we get started. 


Following this anticipation period, the experi 
menter turned on a tape recorder and presented Ror- 
schach Card II. After the subject had responded, the 
experimenter commented, “Those responses are not 
as well integrated as they might be.” 

The subject was then asked to freely associate for 
a couple of minutes, during which he was again 
mildly criticized for not “really” saying everything 
that came to mind. 


Continuous recording of palmar skin conductance 
and measures of blood pressure at predetermined in- 
tervals were obtained during the session. At the end 
of each session the two experimenters rated the sub- 
ject on a seven-point scale in terms of how mani- 
festly anxious the subject appeared to be during the 
session. The correlations between the independent 
ratings of the two experimenters were .62 for the 
female subjects and .73 for the male subjects. On 
the basis of the intercorrelations among the stress 
measures, a criterion index of anxiety was composed 
based on the sum of the standard scores for (a 
level conductance, (b) change in 
sure, and (c) the average of the two experimenters’ 
anxiety ratings. Intercorrelations among the thre 
measures composing the index are shown in Table 1 
There were no significant differences 


between the 
means of these scores for males and females 


base 
systolic blood pres- 


RESULTS AND DISCUSSION 


Correlations between the criterion index of 
anxiety and the Anxiety scale scores for the 
various instruction groups and scoring pro- 
cedures are shown in Table 2. The correla- 
tions are presented separately for males and 
females, and it is apparent that the male sub- 
jects yielded no significant correlations under 
any condition. The negative correlational re- 
sults for the male subjects suggests caution in 
interpreting the other findings; however, as 
will be seen the female subjects yield results 
highly consistent with the theoretical expec- 
tations. 

For the female subjects the highest correla- 
tion, .62, is for the “6-month” instruction 
group for Scoring Procedure A, the one de- 


signed to reduce social desirability variance. 
The correlation for Scoring Procedure A un- 
der “in general” instructions, 
nificantly different from zero (p < .05) but 


49, is also sig- 


TABLE 2 


CORRELATIONS BETWEEN THE 


ANXIETY INDEX AND THE THREE SCORING 


IN THE DIFFERENT GROUPS 


Scoring 
procedure 


A 
B 
Social desirability 


Note.— of .44 is significant 





Instructions and Social Desirability 


TABLE 3 


MEANS AND SIGMAS OF THE 


Instruction 
group 
In genera 10.06 
14 7 


2 weeks 9.93 


6 months 


not significantly different from the correlation 
obtained for the ‘6-month” instruction group. 
The correlation of .10 for the “2-week” in- 
struction group is significantly less (p < .05) 
than the obtained for the “6- 
month” group. Thus, as expected, for the fe- 
male subjects, the “2-week” instruction does 
have the lowest predictive validity, perhaps 
because it reflects unduly the more transient 
states of anxiety. And although the difference 
between the and “in general’ 
correlations is in the expected direction, the 


correlation 


“6-months” 


difference failed to reach significance. 

Scoring Procedure B, where no attempt was 
made to reduce social desirability variance, 
did not yield any significant correlations. By 
employing the somewhat dubious procedure 
for testing for the significance of the difference 
between correlations based on the same sub- 
jects (McNemar, 1949, p. 124), it was found 
that the correlations for Scoring Procedure A 
were significantly higher (p < .05) than the 
correlations obtained with Scoring Procedure 
B for both the “in general” and “6-months” 
groups. 

None of the correlations for the social de- 
sirability scoring procedure was significant, al- 
though the correlations were of substantial 
magnitude for both the “in general” and “2- 
weeks” groups. 

The means and sigmas for the different in- 
struction and different pro- 
cedures are shown in Table 3. There were no 
significant differences between male and fe- 
male subjects for any of these means and, 
accordingly, the two sex groups were com 
bined to yield an N of 80 for each instruction 
group. It can be seen that there is a general 


groups scoring 


PHREE SCORING PROCEDURES FOR MALES 


AND FEMALES COMBINED 


Scorir g proced ire 


Social desirability 


Mear 


tendency for the means to increase as the in- 
structional time interval decreases. For Scor- 
ing Procedure A this tendency is not signifi- 
cant as tested by analysis of variance. For 
both Scoring Procedure B and the social de- 
sirability scoring procedure, there is a sig- 
nificant effect (p < .05) of instructional time 
interval mean scores. These re- 
with the theoretical ex- 
pectation that subjects would admit to more 


these 
sults are consistent 


upon 


unfavorable characteristics as the time inter- 
val decreases one cannot conclude 
that Scoring Procedure B manifests this effect 
more than Scoring Procedure A. A correlated- 
measures analysis of variance was performed 
on the A and B scoring procedures, and the 
interaction of scoring procedure by 


However, 


instruc- 
tional time interval was not found to be sig- 
nificant. 

In conclusion the results of the present re- 
that both the instructional 
time interval and social desirability variance 
affect the predictive validity of a paper and 
pencil test of anxiety. It was also found that 
subjects are likely to say more unfavorable 
things about themselves when the time inter- 
val being reported on is short. 

It was not the purpose of this paper to 
publish a new psychometric test and, in fact, 
item analyses of the present scale (not re- 
ported in this paper) suggest that many of 
the item triplets are not predictive at all. The 
completely negative results for the male sub- 
jects emphasize this point. 


search indicate 


SUMMARY 


The primary purpose of this research was 
to study the effect of instructional time in- 
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terval and social desirability variance upon 
the validity of a forced-choice anxiety scale. 
Subjects in three different groups were asked 
to answer the scale in terms of how they had 
been during the last 2 weeks, last 6 months, 
and in general. The forced-choice triads were 
then scored by Procedure A, which was de- 
signed to reduce variance associated with the 
social desirability of the items, and by Pro- 
cedure B, which was presumed to be heavily 
affected by the social desirability of the items. 
A criterion index of anxiety was obtained in 
an individual stress session, and was based on 
skin conductance level, change in systolic 
blood pressure, and a rating of anxiety. 

The results were entirely negative for the 
male subjects; no significant correlations were 
found for any instruction group or for either 
scoring procedure. For the female subjects the 
results were in accord with the theoretical ex- 
pectations. The highest predictive validity 
was obtained for the ‘“6-month” instruction 
group for the scoring procedure that was de- 
signed to minimize social desirability vari- 
ance. The correlation with the criterion was 
also significant for the “in general” instruction 
group for Scoring Procedure A. No criterion 
correlations were significant for Scoring Pro- 
cedure B. 

A second aim of the research was to study 
the effects of instructional time interval upon 


and Barclay Martin 


the mean social desirability scores, which 
were assessed by a third scoring procedure. 
As expected, subjects tended to admit to more 
socially undesirable characteristics as the in- 
structional time interval decreased. 
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RESPONSE STYLES IN CLINICAL 
AND NONCLINICAL GROUPS 


H. J. WAHLER 


Health Center, Ohio State University 


Over the past decades, conceptions of per- 
sonality measurement have undergone various 
transitions. One change is reflected in an in- 
creased interest in molar response character- 
istics conceived as response sets (Cronbach, 
1946), or response styles (Jackson & Messick, 
1958), elicited by the verbal stimuli of ques- 
tionnaires. Two response have been of 
particular interest currently. One, Edwards 
(1953) describes as the 
variable.” 


sets 


“social desirability 
He and other investigators found 
high correlations in the vicinity of .87 be- 
tween rated item social desirability and av- 
eraged student groups on self-de- 
scriptive questionnaire items. The other is a 
which Cronbach (1946, 1950) 
termed acquiescence or a tendency to agree 
(or disagree) with items irrespective of their 
content. In their recent review, Jackson and 
Messick (1958) conclude that: 


scores of 


response set 


In the light of accumulating evidence, it seems likely 
that the major common factors in personality inven- 
tories of the true-false or agree-disagree type . 
are interpretable primarily in terms of style 
than specific item content (p. 247 


rather 


If the major common factors in personality 
inventories are interpretable in terms of re- 
sponse (R) styles, then two groups which 
differ significantly on a scale of psychiatric 
symptomatology should differ signifi- 
cantly in terms of R styles shown by the 
members. A sample of patients that cannot 
be so discriminated from controls should not 
show different proportions of R styles than 
controls. Also subgroups of subjects from dif- 
ferent clinical and nonclinical groups who ex- 
hibit the same R styles on one questionnaire 
should have comparable scores on different 
scales. Furthermore, if R styles may be in- 
terpreted as the major common factor rather 
than specific item content, scores obtained by 
subjects with one set of personality items 


also 


should covary with their scores derived from 
different scales with different content, the di- 
rection being consistent with the bias of their 
R style. 

Messick and Jackson also point out that R 
set studies have tended to focus on one or 
another R style such as acquiescence or social 
desirability without studying both conjointly. 
One point which particularly bears special at- 
tention is the possibility that the set to agree 
or disagree may interact with item 
ability. 

The purpose of this study is to investigate 
the above propositions which may be briefly 
restated as four questions: (a) Are significant 
differences found between clinical and non- 
clinical groups in terms of the frequency with 
which different R styles occur when the clini- 
cal group can be differentiated from controls 
by a scale of general psychiatric symptoma- 
tology, and when clinical and control groups 
cannot be so discriminated? (6) Do subjects 
who show the same R styles in self-ratings ob- 
tain comparable scores with true-false scales 
in spite of their being members of different 
clinical and nonclinical groups? (c) Do sub- 
jects exhibit the same R sets with different 
items and modes of responding? For example, 
if they tend to deny traits on a self-rating 
scale do they show the same tendency with 
another set of items and a true-false mode of 
responding? (d) Is the tendency to claim or 
deny undesirable traits related to claiming or 
denying desirable traits or are these tenden- 
cies independent? Do the clinical and control 
samples differ in this respect? 


desir- 


“PROCEDURE 
Res ponse Styles 


Couch and Keniston (1960) have shown a signifi 
cant correspondence between average level of re- 
sponse to items rated on a seven-point scale and the 


number of “true” responses to MMPI items. This 


-22 
S335 
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ihey conclude shows that the R set to agree or dis- 
agree is demonstrable with both methods of respond 
ing. If this is the case, four different R styles may be 
readily defined based on the level of individual sub- 
jects’ responses to a self-rating inventory. These in- 
dices reflect the three response sets of major current 
interest, namely, the set to agree, to disagree, and to 
give responses that correlate positively with per- 
ceived social values associated with items, and a 
fourth style which is the opposite of the latter. With 
a self-rating inventory containing items judged de- 
sirable (D) and other items judged undesirable (U), 
a two-way classification of scores in terms of level 
(high-low mean ratings) and item desirability (D-U) 
can serve to define the four R styles. A low score on 
either D or U scales was defined as one below the 
median of the distribution of scores for all subjects 
combined. A high score was defined as lying above 
the median of combined distributions. Subjects who 
rate low on both D and U scales are showing a tend- 
ency to deny or disagree irrespective of content and 
perceived social values of items. Subjects rating high 
on both D and U scales are exhibiting an R style 
indicative of an agreeing set. Subjects who rate high 
on D and low on U are, by definition, manifesting a 
social desirability R style. The fourth style evolves 
logically from the two-way classification of scores 
Subjects rating low on D and high on U scales would 
fall in this class. Logically this R style can be de- 
scribed as the antithesis of a social desirability set, 
a social undesirability set. ; 


Self-Description Inventory 


The self-description inventory (SDI) contains 44 
items with heterogeneous content pertaining to 
characteristics of common clinical interest such as 
anxiety, hostility, sexual adjustment, interpersonal 
adjustment, dependency, etc. This content is phrased 
in the first person with nontechnical language, i.e., 
“JT have trouble getting along with people.” Subjects 
are instructed to rate themselves on these items by 
means of a nine-point scale with anchoring state- 
ments ranging from “not at all like me” to “beyond 
question very much like me.’ 

Twenty-seven of the items which had been in- 
dependently rated as slightly to highly undesirable 
(mean ratings of less than five with a nine-point 
scale of D-U) were selected in conjunction with a 
different study, Wahler (1958), on the basis of their 
ability to discriminate mental hygiene intake and 
nonpatient groups at the .05 or less level of signifi 
cance. These 27 items comprise the U traits scale 
used in classifying R styles. Eight items which were 
judged slightly to highly desirable (mean ratings of 
greater than 5) make up the D traits scale. The U 
score for each subject is the mean of his self-ratings 
on the U items and the D score is the mean of his 
self-ratings on the D items. 


MMPI Scales 


Three MMPI scales were selected as measures in 
this study since they contain a variety of content 
and require a true-false mode of responding which 


differs from the self-rating approach with the SDI 
Norma Besch and James Taylor kindly 
collected by them available to the author. Besch had 
71 male undergraduates at Ohio State University 
rate 200 MMPI items for “personal desirability” on 
a nine-point scale; the Pt and L scales were included 
among the items. Taylor obtained social desirability 
ratings from 81 adult normals on 205 MMPI items 
which included the K scale. From these ratings it was 
evident that items making up the K, L, and Pt scales 
are perceived as primarily undesirable. Seventy-three 
percent of the K items were judged undesirable. The 
22 undesirable K items had a mean median rating of 
3.79 on a nine-point scale. Eighty-one percent of the 
48 Pt items were judged undesirable with a mean 
mean rating of 3.28 for the 39 


made data 


undesirable items 
Eighty percent of the L items were judged U with a 
mean mean rating of 4.13. Scores from each of these 
accrue mainly from responses in one direc- 
tion. The K score increases with the number of items 
denied except for 8 out of The magnitude of the 
L score is also a function of denying items in all 
cases but 3 out of 15. Pt scores, on the other hand, 
are mainly a function of the number of items claimed 
or agreed with; this is true for 39 out of 48 items 
Furthermore, the item overlap is negligible among 
these three scales with only one common item (J-51) 
scored in the same direction on the K and L scales 


scales 


Subjects 


The nonpsychiatric subjects consisted of 26 male 
and 44 female sophomores taking an introductory 
psychology course and 39 male students taking an 
elementary personality course at the State Univer- 
sity of Iowa. The SDI and a “Biographical Inven 
tory” containing 240 MMPI items were administered 
to these people in groups. These people were told 
that their responses were to be studied as part of a 
research project and would not be used in any per 
sonal way 

Two different subpopulations of clinical subjects 
were sampled on the assumption that outpatients 
who voluntarily seek help at a mental hygiene clinic 
are more likely to describe their characteristics 
frankly (as they conceive them) than are hospital 
ized patients who as a group often tend to exhibit 
more severe pathology manifested by extreme re- 
ality distortion and resistive defenses exhibited as 
marked denial or negativism and who frequently 
have been pressured by relatives and/or community 
to consign themselves to conditions which they may 
not like and/or believe they don’t need and hope to 
leave by appearing “normal.” 

The Mental Hygiene Clinic (MHC) subjects (as 
sumed more frank) consisted of 47 males in the 
process of applying for help with personal problems 
at a Veterans Administration Mental Hygiene Clinic 
Their mean age was 32.6 with a range of 22 to 48 
All tests were ac'ministered individually in the course 
f their evaluation for treatment. Subjects in this 
group were given the card form of the MMPI. Also, 
a briefer form of the SDI was given this group; the 
U score is based on 22 of the 27 items and the D 
score is based on 5 of the 8 desirable items. 
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SDI MMPI 
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VAHS (( 


Pairs of scores compared** 


\:B 2 $5 001 
\:( 2 5 38 OO} 
B:¢ 1 7 6.5 001 
All MMPI Scores for VAH 


er ail 


The hospitalized (VAH) group (assumed more. as a group was well substantiated by the rela- 
guarded) consisted of 75 patients, 65% of whom 


} tive levels of mean Pt, K, and L scores of 
were diagnosed as some type of schizophrenic; the ; F : th h } 

remaining were diagnosed as other types of func- ‘eS€ groups in Comparison with each other 
tional psychiatric disorders. Their mean age was 34.4 and controls as shown in Table 1. MHC sub- 


with all but two cases within the range of 18 to 49 jects had significantly higher Pt scores than 
7 he two exceptions were 60 and 61 years of age. All Controls or hospitalized patients while the 
subjects were administered the SDI individually dur- : " ne . . 

ing admission testing. Tests other than the SDI were ™ean Pt of VAH subjects did not differ sig- 
administered at the discretion of the examiners; only nificantly from controls. To the extent that 
24 were given the card form of the MMPI 


the Pt reflects psychiatric symptomatology, 
the hospitalized patients claimed no more 
symptoms than did students. Thus, the VAH 
group qualifies as a clinical sample that is not 
at an outpatient clinic respond more frankly differentiated from controls by a scale reflect- 
(less defensively) than hospitalized patients ing psychiatric pathology while the MHC 


RESULTS 


The assumption that patients seeking help 


TABLE 2 


FREQUENCY OF Four R StyLtes SHOWN BY GROUPS 


Socially Sociall) 


indesirable desirabk Agreeing 


ow D-High High D-Low lt High D-High | 


9 12 
19 10 
x 18 


3 


parisons 
VAH: MHC, x? = 33.8, p - MHU:F, 
VAH:F, x? = 8.4, .02 < p< MHU:} 
VAH:M, x? = 3.3, p > .30 P: 


Note.—Values in parentheses are percentage 
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OF C;ROUPS 


Socially 


undesirable 
Group 


VAH 
MHC 
Student 


Student 


group was so differentiated. Also, the VAH 
patients scored significantly higher than MHC 
or student subjects on K and L indices of 
defensiveness. 

The most striking difference in frequencies 
of R styles is exhibited by the MHC group; 
as shown in Table 2, 91% of the MHC sub- 
jects had socially undesirable or agreeing 
styles. The over-all differences in frequen- 
cies within Table 1 are highly significant (p 

.001). Comparisons of groups in pairs show 
that the MHC group differs significantly from 
all others. Frequencies of R styles shown by 
the VAH group did not differ significantly 
from those of control males but did differ 
significantly from those of females (.02 < p 
< 105). 

When these frequencies are grouped in 
terms of high or low scores on U, disregard- 
ing D, the group differences remain signifi- 
cant. When the frequencies are classed in 


rABLE 


MEAN A 


SCORES OF GROI 


socially 


Group 


VAH 
MHC 
Student 


Student 
I 
df 


h 
f 


15 


11.0 16 


terms of high or low scores or 
ing U, differences 
longer significant. 


1 D, disregard- 
between groups are no 

Subjects from the different diagnostic groups 
were formed into subgroups based on the R 
styles they showed in responding to the SDI. 
Mean Pt, K, and L scores were computed for 
each of these subgroups. In Table 3 it may 
be seen that the mean Pt scores within each 
column are quite comparable for members of 
different diagnostic groups with the same R 
styles. None of the F tests for significance of 
differences within columns attained 
cance at the .05 level 


signifi- 
When row means are compared (within 
diagnostic group across R styles) it is evi- 
dent that within each of the four diagnostic 
groups, subjects who had socially undesirable 
or agreeing R styles obtained higher mean Pt 
scores than did those with denying or social 
desirability styles. Differences between these 
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Sociall 
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means for with significant F tests 
across R styles were significant at the .05 or 
less level, as determined by ¢ tests of pairs of 
means, with one exception where the Pt mean 
in the socially undesirable category was not 
significantly different from that in the deny- 
ing category for female students. None of the 
mean Pt scores of subjects exhibiting denying 
or social desirability R styles differed signifi- 
cantly; this was also true for subjects show- 
ing socially undesirable or agreeing styles. 
The same analyses were computed for K 
scores. In this case, there were no significant 
differences within R styles (columns) except 
in the social desirability category. Here, the 
VAH group obtained significantly higher K 
scores than did either of the student groups. 
In Table 4 it may be seen that subjects who 
had socially undesirable or agreeing R styles 
irrespective of diagnostic 
obtained lower mean K scores than did sub- 
jects in the social desirability or denying cate- 
gories. Again, ¢ tests of differences between 
combinations of K means taken two at a time 


groups 


group membership 
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CORRELATIONS AMONG 


Group 


VAH 
MHC 
Student (F) 
Student (M) 


J = 75 in 


this instance for VAH gr 
Significant at .05 or kk level of sig 


for the two student groups (groups with sig- 
nificant Fs across R style categories) yielded 
significant differences at the .05 or less level 
between K means obtained from subjects with 
denying or social desirability styles and those 
with socially undesirable or agreeing sets. 

Comparable analyses of the mean ZL scores 
showed different trends than were found with 
K or Pt. It may be seen in Table 5 that, in 
general, VAH patients have the highest L 
scores irrespective of R styles except for the 
small number of MHC cases who exhibited 
the denying style. There were no significant 
differences the two student groups 
either across sex classifications or across R 
style categories. The mean L scores of MHC 
cases in the socially undesirable and agreeing 
categories do not differ significantly from 
those of students. The VAH group had a sig- 
nificantly higher mean LZ than either student 
group in the social desirability and denying 
categories; the MHC L was also significantly 
higher than mean L scores of students in the 
latter category. 


between 
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SELF-RATING SCORES 
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The intercorrelations of AK, LZ, Pt scores 
and mean U and D ratings are presented in 
Table 6. Nineteen out of the 24 correlations 
among scales composed mainly or entirely of 
undesirable items (K, L, Pt, and U) are sta- 
tistically significant at the .05 or less level 
and in all instances except one (L:U) the 
coefficients are in the same direction across 
groups. The magnitudes and directions of 
these coefficients indicate that subjects tend 
to deny or claim socially undesirable traits 
with some degree of consistency on different 
scales. The mean ratings on desirable items 
were correlated with scores from the scales 
comprised of undesirable items; only 5 out of 
the 16 coefficients were significant and the 
directions of the coefficients across groups 
were not consistent. 


DISCUSSION 


The assumption that a “frank” clinical 
group should exhibit significantly different 
proportions of R styles relative to controls 
was supported by the findings. It was also 
found that the various R styles occurred with 
about the same frequency in a comparable 
control group and a group of clinical cases 
that could not be discriminated by a scale of 
psychiatric symptomatology. Both of these 
findings are in accord with Jackson and Mes- 
sick’s conclusion that the major common fac- 
tors in personality inventories are interpret- 
able in terms of response style. 

From their conclusion it would also be pre- 
dicted that subgroups of subjects from differ- 
ent clinical and nonclinical groups who exhibit 
the same R styles on one questionnaire should 
have comparable scores on different scales. 
For two of the MMPI scales this was found 
to be the case; with the L scale, hospitalized 
subjects scored consistently higher than other 
subjects, irrespective of R styles. Eichman 
(1959), in comparing female schizophrenic 
patients with controls, also found that high 
L scores characterized his samples of hos- 
pitalized patients. 

It was found that some scales with different 
content reflect the bias of the respondee’s R 
style in a consistent direction, but not others. 

The correlational analyses provide evidence 
that members of all groups tended to respond 
with some degree of consistency to scales com- 
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posed of socially undesirable items. Correla- 
tions of the various scales with mean self-rat- 
ings on D items failed to show any consistent 
trends and the majority of the coefficients 
were not significant. These findings are in op- 
position to an assumption that a general ac- 
quiescence set may operate independently of 
the judged desirability or undesirability of 
items. In general, the above findings consti- 
tute or suggest interactions. 

It was not possible in this study to use a 
four factor analysis of variance design because 
the limited number of clinical subjects made 
it infeasible to in order to 
achieve the proportionality of scores required 
for such analyses. However, four factors are 
represented in the more fragmented series of 
analyses: (a) samples from different sub- 
populations (this group’s factor is confounded 
with situational factors); (6) different scales, 
and two categorical variables used in classi- 
fying subjects; (c) high or low scores (claim- 
ing or denying tendencies) on self-rating 
scales; and (d) classifications of items as so- 
cially desirable or undesirable on the basis of 
independent ratings. 

The essential R style factor found in this 
study was what DeSoto and Kuethe (1959), 
term the “symptom-claiming set’”—a disposi- 
tion on the part of subjects to claim (or deny) 
undesirable symptomatic traits. But this fac- 
tor is hardly pure since it interacts with scales 
—which may in turn be a function of differ- 
ent types of content. A Type I design (Lind- 
quist, 1953) was used to test the significance 
of the scales by R styles interaction; this in- 
teraction was significant (p < .001) compar- 
ing MMPI scores on K, L, and Pt for all sub- 
jects classed as denying undesirable traits on 
self-ratings and subjects tending to claim un- 
desirable traits. It was also possible with this 
material to analyze the group by scales inter- 
action, which likewise was significant at less 
than the .001 level. Analyses of frequency of 
R styles across groups showed an item de- 
sirability by groups interaction. While a more 
elegant complete factorial analysis was not 
possible, the evidence clearly points toward 
multiple interaction effects among the vari- 
ables considered. 

If the general implications of these findings 
are borne out by additional and more explicit 
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evidence, Jackson and Messick’s statement 
might need rephrasing. For example, it might 
have to be changed to read “it seems likely 
that the major common factors in personality 
inventories of the true-false or agree-disagree 
type are interpretable primarily in terms 
of .. .” R styles which in turn are functions 
of interacting variables such as social desir- 
ability of items, certain specific types of item 
content, differential characteristics of the sub- 
populations sampled, and the circumstances 
under which subjects are tested. 


SUMMARY 


Several implications follow from the propo- 
sition that the major common factors in per- 
sonality inventories are interpretable mainly 
in terms of response styles. Among them are 
the following three hypotheses: (a) Groups 
which differ significantly in terms of scores 
on a scale of symptomatviogy should also 
differ in the frequency with which various R 
styles are shown by the members, and that 
groups which cannot be differentiated on the 
basis of such scores should not exhibit sig- 
nificantly different frequencies of R_ styles. 
(6) Subjects who exhibit the same R styles 
on one instrument should have similar scores 
on different scales even though they are mem- 
bers of different subpopulations. (c) Subjects’ 
R sets should be manifested in a consistent 
direction on different scales. A question may 
also be raised as to whether R sets operate 
independently of or interactively with the 
social desirability values associated with items 
comprising scales. 

The findings show that a group of “frank” 
outpatient subjects exhibit a significantly dif- 
ferent frequency of R styles than controls and 
“guarded” patients. A group of “guarded” 
hospitalized patients who scored at the same 
level as controls on a scale of symptomatology 
could not be discriminated from controls by 
differential frequency of R styles. Subjects 
from different diagnostic groups classed ac- 
cording to the R styles they showed on self- 


ratings had similar mean Pt and K_ scores 
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within each R style category but not similar 
L scores. Subjects’ tendencies to deny or claim 
undesirable characteristics were exhibited rela- 
tively consistently in terms of mean scores on 
self-ratings and the Pt and K scales. Different 
diagnostic groups responded differently to the 
L scale irrespective of R styles. Scores on 
scales comprised of items describing undesir- 
able traits were found to covary in a consist- 
ent direction. No such consistency was found 
when these scores were correlated with a scale 
composed of desirable items. 

Consistent response characteristics inter- 
pretable as R sets or styles were found, but 
it was inferred from the findings that multiple 
interaction effects are likely among variables 
such as (a) the particular type of R style 
shown, (6) subpopulation differences, (c) de- 
sirability and undesirability of items com- 
prising scales, and (d) other scale differences 
such as types of content. 
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THE CLINICAL UTILITY OF “INVALID” 
MMPI F SCORES 


MALCOLM D 


GYNTHER 


Washington University Medical School 


Many investigators (e.g., Astin, 1959; Gil- 
berstadt & Duker, 1960; Rempel, 1958) 
eliminate MMPI profiles containing high F 
scores from analyses on the grounds that such 
profiles are not valid. This procedure is con- 
sistent with early injunctions to omit such 
profiles from research work (Hathaway & 
Meehl, 1951), but inconsistent with more 
recent views or experimental findings which 
emphasize the characterological implications 
of scores on the validity scales (Gough, 1956b:; 
Gross, 1959), rather than test taking attitudes 
as such. Determination of the relationship be- 


tween high F scores, diagnostic classification, 


and aggressive versus passive criminal behav- 
ior would seem to be helpful in demonstrating 
whether such “invalid” F scores have any 


predictive value. 


METHOD 


Test data of all 353 white male court referrals ad- 
mitted to South Carolina State Hospital between Sep 
tember 1957 and August 1960 were examined. Two- 
hundred fifty-one completed MMPIs (all cases test- 
able by this method) were found. Five profiles given 
by organic patients were excluded because this num- 
ber of cases was insufficient for analysis as a separate 
category. The remaining 246 cases were sorted into 
subsamples according to the diagnostic impression of 
the psychiatric staff, which was not based on the 
MMPI data. This procedure yielded 194 behavior 
disorders (BD), 29 neurotics (N), and 23 psychotics 
(P). Intelligence estimates in the form of Kent-EGY, 
Scale D scores were available for all cases. Means 
for the BD, N, and P groups were 28.16, 27.14, and 
27.43, respectively. (The average range is 24-31, in- 
clusive.) Statistical analyses revealed no significant 
differences between the groups which suggests that 
whatever differences there are between distributions 
of F scores cannot be attributed to differences in 
intelligence. Mean ages for the BD, N, and P groups 
were 30.31, 39.96, and 37.83 years, respectively. Sta- 
tistical analyses showed that the BD group is sig- 
nificantly younger (p < .01) than either of the other 


groups. 


Different 
basis for discarding 


different F values as a 
data. Sometimes the reader is 
informed that cases were removed because the 
validity e.g., Rosen, 1958; Sop- 
chak, 1958), but in most cases the 
score is given (e.g., Goodstein & Dahlstrom, 
Panton, 1958). In this investigation, high 
fined as F 
ommendation of Gough (1956a 


investigators us¢ 


only 
scores were “hig! 
exact cutting 
1956; 
was de- 
following the rec 
and Meehl (1956) 


16 raw score points, 


RESULTS AND DISCUSSION 


Table 1 shows the distribution of F scores 
for the BD, N, and P groups. Mean F scores 
were 8.66, 6.76, and 8.04, respectively. Sta- 
tistical analyses revealed no significant differ- 
ences which implies that differences in the 
total distribution of F scores cannot be at- 
tributed to differences in diagnostic classifica- 
tion. However, there were striking differences 
between the groups with regard to distribu- 
tion of F scores greater than 16. Thirty-nine 
scores fell into this invalid category, with 37 
of these being given by individuals classified 
Percentages of F > 16 
scores for the BD, N, and P groups were 
19.1, 0, and 8.7, respectively. Chi square, cor- 
rected for continuity, showed that these dif- 
ferences depart significantly from chance (,? 

6.04, df = 2, p < .05). 

The significant age differences between the 
groups raise the question of whether the dif- 
ferences in F > 16 distributions might not be 
explained by the age differences alone. Analy- 
sis of the younger and older halves of the BD 
group showed that the younger subsample 
gave F > 16 scores more frequently than the 
older men (y? = 5.64, df = 1, p < .02). Also, 
the two F > 16 scores found in the psychotic 
group were given by 22-year-old men. Age is 
obviously an important factor. However, if 
the mean age of the BD group is adjusted 
(by eliminating every other subject 30 years 


as behavior disorders 
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old or younger) so that it is not significantly 
different from the P group, the BD group still 
had twice the percentage of F > 16 scores 
than the P group (25/141 or 17.7% versus 
8.7%). 

It would appear that invalid 
can discriminate between diagnostic 
classifications. That is, in groups of male 
court referrals matched for age and intelli- 
gence, behavior disorders obtained 67% of 
the F > 16 scores, psychotics 33%, and neu- 
rotics 0%. And, if one were to consider only 
individuals 23 years of age or older, 100% 
of the F > 16 scores would be obtained by 
behavior disorders. 

These court-referred individuals differ from 
psychiatric patients-in-general in that they 
all have a reason for “faking bad”: to de- 
crease the probability that they will be con- 
victed of the crimes with which they are 
charged. It would be worthwhile to replicate 
this study with routinely admitted psychiatric 
patients to see if our results are substantiated 
by a group with less reason for dissembling 


MMPI F 


scores 


One check for dissembling in these court-re- 
ferred subjects is to test the hypothesis that 
faking bad is positively related to the severity 


of the crime with which the person is charged. 
Analysis of the data given by murderers and 
rapists (V = 31) does not support the hy- 
pothesis, since the percentage of F > 16 
scores given by this subgroup with the most 
serious charges against them is nearly equiva- 
lent to the percentage obtained by the re- 
mainder of the sample (16.1 versus 
Furthermore, the dissembling interpretation 
does not account for the differential F > 16 
results found with the different diagnostic 
classes. 


15.8). 


With regard to the characterological inter- 
pretation of the F scale, it is interesting to 
note that Leary (1956) considers F to be a 
measure of the aggression and sadism to be 
expected in interpersonal relations. Thus, the 
higher the elevation on F, the more cruel and 
unkind the individual is predicted to behave. 
From that point of view, our subjects who 
obtained F > 16 scores would be considered 
as more aggressive in an antisocial manner 
than the remainder of the sample. Analysis of 
the F > 16 scores obtained by those who 
committed aggressive crimes (i.e., stealing, 


rABLE 1 


FREQUENCY DistrRIBUTION OF RAW SCORES ON THI 
MMPI F Scace For BEHAVIOR DisoRDER, NEUROTI 


AND Psycuotic GRovUPS 
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murder, etc.) versus those who com- 
mitted passive crimes (i.e., forgery, breach of 
trust, drunken driving, etc.) shows that there 
is a tendency for the aggressive criminals to 
obtain F > 16 scores more frequently than 
the passive criminals (y* = 3.04, df = 1, .05 

p < .10). 

The interpretation of MMPI F 
indicating aggressiveness also casts some light 
on the differential F > 16 scores by diagnosis 
Neurotics, who obtained no F > 16 
tended to commit passive or asocial crimes, 
whereas the behavior disorders and psychotics 
tended to commit antisocial crimes. An illus- 
tration may clarify this point. Of the 14 sex 
crimes committed by neurotics, 8 were incest, 
3 were rape, and 3 were “Peeping Tom.” 
With regard to the 42 sex crimes committed 
by behavior disorders and psychotics, 21 were 
rape or attempted rape, 7 were lewd acts on 
children, 6 were indecent exposure, 5 were 
incest, 2 were sending letters to 
women, and 1 was “Peeping Tom.” This lat- 
ter group would appear to contain a far 
higher percentage of aggressive acts against 
society than the neurotic group, which is con- 
sistent with the differential F > 16 findings 
and the interpretation of F as an indicator of 
aggressive and sadistic behavior. 


rape, 


scores as 


scores, 
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SUMMARY 


This study investigated the relations be- 
tween “invalid” MMPI F scores, diagnostic 
classes, and aggressive versus passive criminal 
behavior to determine if F > 16 scores, which 
usually lead to elimination of the MMPI 
prior to analysis of the data, have any pre- 
dictive significance. MMPIs were available 
from 246 white male court referrals who were 
classified as behavior disorder (N = 194), 
neurotic (V = 29), or psychotic (V = 23) 
by the psychiatric staff. 

Thirty-nine of the 246 subjects obtained 
F > 16 scores. Thirty-seven of these 39 de- 
viant scores were obtained by behavior dis- 
orders. When the data were adjusted to equate 
the groups on age and intelligence, behavior 
disorders were shown to have 67° of the 
F > 16 scores, psychotics 33%, and neurotics 
0% of such scores. It was also demonstrated 
that younger men more frequently obtain in- 
valid F scores than older men. Although all 
the subjects had reason to dissemble, the re- 
sults seem most consistent with a charac- 
terological interpretation of the F scale. 

The practice of discarding MMPI data be- 
cause of invalid F scores seems highly ques- 
tionable, especially if the investigator wishes 
to draw valid conclusions about groups, such 
as behavior disorders, who are likely to dis- 
play aggressive, antisocial actions. 
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INTERACTION OF BRAIN INJURY WITH PERIPHERAL 
VISION AND SET 


HAROLD L 


WILLIAMS, CHARLES F 


GIESEKING, anp ARDIE LUBIN 


Walter Reed Army Institute of Research 


On tests such as Kohs’ Block Design, the 
Bender-Gestalt, or Benton’s Memory for De- 
signs where designs have to be reproduced, 
the phenomenon called rotation frequently oc- 
curs. The subject reproduces the design cor- 
rectly, but tilts it at an angle to the target 
design, sometimes as much as 45° to 90 
Rotation has been frequently in 
brain injured patients, in children, and in 
dull normals (Bender & Teuber, 1948; Gold- 
stein & Scheerer, 1941; Hanvik & Anderson, 
1950; Pascal & Suttell, 1951). 

The Block Design Rotation Test (BDRT°*), 
devised by Shapiro (1951), was used in a se- 
ries of studies by Shapiro (1951, 1952, 1953) 
and Yates (1954) to that: 
properties of the target design had a signifi- 
cant effect on rotation, intelligence correlated 
negatively with rotation, reducing peripheral 
vision increased rotation in normal subjects 
Williams, Lubin, Gieseking, and Rubinstein 
(1956) confirmed these findings but found 
that much of the rotation variance was ac- 
counted for by intelligence. This effect was 
so strong that dull normal subjects could not 
be discriminated from brain injured on the 
basis of their addition, 
they found an interaction between brain in- 
jury and peripheral vision; restricting pe- 
ripheral vision did increase rotation for con- 
trols but decreased rotation for the 
injured. 

This paper describes two experiments. The 
first experiment demonstrates the existence of 
an interaction between instructions and brain 
injury; calling attention to tilt in the repro- 
duced design benefits normal subjects more 
than brain injured. The second experiment 


observed 


show geometric 


rotation scores. In 


brain 


1In the BDRT the subject uses four blocks taken 
from the Wechsler-Bellevue Block Design subtest to 
reproduce blue and yellow designs, 1 inch square, 
painted on a white card, 6 inches square 


shows that this interaction effect and the ef- 
fect due to reduced peripheral vision can be 
replicated, and that the two interactions can 
be combined to discriminate dull normals 
from brain injured. 
EXPERIMENT 1: Errect oF No-TIL1 
INSTRUCTIONS 


In all previous studies using the BDRT, 
the standard Wechsler-Bellevue Block Design 
instructions were used; i.e., the subject was 
not warned about rotation, he was simply told 
to reproduce the designs as accurately as pos- 
sible. Thus, it was not possible to determine 
how much of the greater rotation of the brain 
injured was produced by their inattention to 
tilt. 

On occasion, subjects were asked to correct 
the tilt in their completed designs. Some brain 
injured subjects were unable to do so, al- 
though they seemed to be trying. Control sub- 
jects generally had no difficulty when the tilt 
was Called to their attention. This suggested 
that rotation was partly the result of inatten- 
tion, but that the brain injured subjects, in 
addition, had difficulty perceiving rotation. 


In Experiment 1, brain injured patients 


were compared with non-brain-injured con- 
trols under standard and no-tilt instructions. 
Four groups of 20 


subjects were used: (a) 
brain injured with standard instructions (BS), 
(6) brain injured with no-tilt 
(BN), (c) with standard instruc- 
tions (CS), (d) controls with no-tilt instruc- 
tions (CN). 


instructions 
controls 


Subjects 


Forty male brain injured 
from the Neurology and Neurosurgery Services at 
Walter Reed General Hospital. Table 1 shows the 
frequencies for the several types of injuries and a 
breakdown of these for approximate lateral localiza- 


patients were selected 
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rABLE 1 
BRAIN INJuRED St 
ND LOCATION OF INJI 

MENT 1 


l ype of injury 


Skull fracture 
Gunshot wound 
Closed head injury 
Vascular disorders 
Neoplasms 

En ephak D | vathies, n. 


Total 


tion includes 


and menin- 


The category “encephalopathies, n.e.c” 
cases with encephalitis, Wilson’s disease, 
gitis. 

As can be seen in Table 1, the majority of the 
brain injured patients had relatively diffuse damage, 
difficult to localize. They were tested as soon after 
hospitalization as they were able to cooperate with 
the examiner, and understand instructions. At the 
time of testing, the length of hospitalization ranged 
from 1 to 7 months with a mean of 
ard deviation of 1.6 

In a brief mental-status examination conducted 
prior to each test, the examiner judged 14 patients 
to be disoriented for time and/or place, with im- 
paired memory. Patients were accepted for the study 
if they showed in practice trials that they under 
stood the standard instructions for the Wechsler 
Bellevue Block Design subtest. Prior to the occur 
rence of brain injury, all patients had been in gen 
eral good health. 

The 40 male controls were selected from the non 
brain-injured, nonpsychiatric patient population at 
Walter Reed General Hospital. They had exhibited no 
signs of CNS damage on examination at hospital 
entry. A questionnaire was used to eliminate patients 
with a history of head injury. 

The Army Classification Battery (ACB) (Monta 
gue, Williams, Lubin, & Gieseking, 1957), adminis- 
tered at Army entry previous to illness or injury, 
was available for 30 of the brain injured patients 
Thirty of the controls were so selected as to match, 
individually, these 30 patients on the Pattern Analy 
sis subtest of the ACB and on the time interval be 


2.4 and a stand- 


2Tables giving additional information on symp 
tomatology, mental status, and special diagnostic 
studies for each patient are filed with the American 
Documentation Institute. Order Document No. 6871 
from ADI Auxiliary Publications Project, Photodupli 
cation Service, Library of Congress; Washington 25, 
D. C., remitting in advance $1.75 for microfilm or 
$2.50 for photocopies. Make checks payable to: 
Chief, Photoduplication Service, Library of Congress 


Gieseking, and A 


Lubin 


tween first administration of the ACB and subsequent 
administration at entry. Pattern Analysis 
was used because it appears to be a reliable, valid 
measure of spatial ability, relatively free from the 
effects of education. Matching on time since initial 
testing provides some control on age and rank. There 
were no significant differences in age among the four 
groups. The ages ranged from 18 to 5 
mean of 28.4 and a standard 


hospital 


years with a 


leviation of 8.3 


Procedure 


The 40 target cards of the BDRT were placed, 1 
by 1, in a single marked position on a table 32 
inches wide by 50 inches long by 30 inches high 
The surface of the table was painted a dull black 
The target card was centered with respect to the 
length of the table and was 15 inches from the sub 
ject. The subject made his block designs within a 
circle of points, 6 inches in diameter, located at the 
table edge 

In the “no-tilt” groups (BN and CN) attention to 
correct orientation was induced by instructions, dem 
onstrations of tilt, practice trials, and warnings when 
rotation occurred during the test. The 
groups (BS and CS) received standard 
Block-Design instructions 

When the subject indicated that he had completed 
a design, his reproduction and the target design wer 
photographed with an overhead camera 
grees of rotation from the target design were meas 
ured from the film, ruler and protractor 
Two individuals made independent measures of ro 
tation for each target card, and adjusted their scores 
alter The ad 
ige difference of about 
2 degrees. The average of | two adjusted scores 
was used for each card 

Prior to administering the BDRT, all 80 subjects 
were given the Arithmetic, Vocabulary, and Block 
Design subtests of Wechsler-Bellevue. This Wechsler 
AVB combination was used to estimate intellectual 
level at the time of the stud 


remaining 


Wee hsler 


Later de 


using a 


discussion of major disagreements 


justed scores showed an a 


Results 


Figure 1 shows the four groups in relation 
to the M * rotation score and the Wechsler 
AVB measure of intelligence 

The control group receiving the no-tilt in- 
structions is significantly * lower on rotation 


than the other three groups. 
this CN group, the remaining 
do not differ significantly on M. 


If we remove 
three groups 
Table 2 gives 


3 R, the total degrees of rotation over the 40 cards 
has a very skewed distribution. The logarithmic func 
tion M 100 [log R 1] was used to reduce skew 
and to reduce nonlinearity of 
gence. 

#In this paper “significant” refers to the 
or better 


regression on intelli 


05 level 
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the mean and variance of M for each group 
of 20 subjects, as well as the correlation of 
M with AVB within each group. By the usual 
two-way analysis of variance of Table 2, the 
effect of instructions is significant at the .01 
level, the brain injury effect is significant at 
the .05 level. The interaction of brain injury 
and instructions is significant at the 
only if a one-tailed test is used. 
The average M scores for the diagnostic 
classifications shown in Table 1 did not differ 
significantly, nor was there a significant later- 


.05 level, 


ality effect. There was no significant differ- 
ence between average M scores for the groups 
judged to be oriented and disoriented in the 
mental 


status examination 


Discussion 


When tilt the subject’s atten 
tion, the controls are able to reduce their ro- 


° 11.3 
is Calied to 


tation to about 4 degrees per card, quite close 
to the 


However, the brain injured, even with 


2-degree average error measurement 
no-tilt 
instructions, still average about 12 degrees of 
card. 


rotation pér 


These results imply that 
most of the rotation by patients with recent 
general brain injury is due to impairment of 
perception rather than inattention 
In previous studies we were puzzled by the 
persistent negative correlation of about 5 
between rotation and intelligence. Clinical ob- 
servation suggested that with standard block 
design instructions brighter subjects perceived 
Dull 
ibout the 
proper orientation of their design. If atten- 
tion to rotation with intelligence, 
then the strength of the correlation between 


the importance of correct orientation 


subjects seemed less concerned 


inc reases 


170 


160 


ISO 








40 45 


Sum Of Raw Scores for Wechsler 


Arithmetic, Vocabulary, and Block Design 


Fic. 1. Position of the four 


brain injury it 


tion groups with respect to rotation (M) and int 


M and AVB should be reduced in the no-tilt 
instruction groups. As can be seen in Table 2 
this is not true. The nature of the 
intelligence and 


relation 


between rotation remains a 


mystery. 


EXPERIMENT 2: COMBINED 
DUCED PERIPHERAL VISION 


INSTRUCTIONS ON 


Errect oF Ri 
AND No-TILt 
ROTATION 

The purpose of this experiment is to repli- 
cate the two interactions found previously and 
to show that they may be combined to demon 
strate a significant difference dull 
normals and brain injured. The relation be 
tween rotation is such that 
dull normals and brain injured rotate about 


between 


intelligence and 


Mea 


165.15 
159.50 
150.40 
114.35 
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rABLE 3 


CLASSIFICATION OF BRAIN INJURED SuBJECTS ACCORD 


ING TO TYPE AND LocATION OF INJURY: EXPERI 


MENT 2 


Location of injur \ 


fype of injury Left Right  Bilater: 


Skull fracture 
Gunshot wound 
Closed head inju 
Vascular disorders 
Encephalopathies, I 


Tota 


the same amount. But the interactions of pe- 
ripheral vision and no-tilt instructions with 
brain injury are independent of intelligence. 
It follows that these two interactions could 
be combined to show a significant difference 
between dull normals and brain injured. 

The reasoning is as follows: 
tion scores are obtained under three condi- 
tions, (a) standard instructions, (6) standard 
instructions combined with reduced peripheral 
vision and (c) no-tilt instructions with unre- 
stricted vision. Previous results indicate that 
brain injured and dull normals rotate equally 
often on Condition a. For the dull normals we 
would predict an increase in rotation from a 
to 5, and a large decrease from 6 to c. For 
the brain injured patient there should be a 
decrease in rotation from a to J, and a slight 
drop from 6 to c. The difference score, 
k = b —c, should show a significantly higher 
mean for the dull normals since it adds the 
absolute value of the decreased rotation for 
no-tilt instructions to the increase in rotation 
due to reduced peripheral vision. 


Suppose rota- 


Subjects 


Twenty male brain injured subjects were selected 
‘from the Neurology and Neurosurgery Services at 
Walter Reed General Hospital. Table 3 gives fre- 
quencies for the various diagnoses. There are pro 
portionally more bilateral cases, but in other re- 
spects the group resembles the brain injured subjects 
of Experiment 1. At the time of testing the length of 
hospitalization ranged from 1 to 9 months, with a 
mean of 2.9 and a standard deviation of 2.2 

Twenty control patients were selected so as to 
match each brain injured subject on the Pattern 
Analysis score at time of Army entry, and on time 


Gieseking, 


and A. Lubin 


between first and second administration of the ACB 
These controls were selected from the same popula- 
tion described for Experiment 1. A dull normal group 
was formed by selecting 20 control patients who had 
made a score of 80 or below on Pattern Analysis at 
the time of Army entry. (A score of 80 is one stand 
ard deviation below the mean.) The mean age for 
the three groups was 26.9, the standard deviation 8.6, 
and the ages ranged from 17 to 50 years 
no significant 
three groups. 


There was 
differences between the means of the 


Procedure 


Each subject was asked to designate his preferred 
eye, and monocular vision was 
Let A designate the first 
Let B designate the second 
C represents 20 additional tri 
wise, 90 


used throughout 

trials of the BDRT 
trials of the BDRT 
ls obtained by 
rotation of each of the first 20 
cards. Every subject was tested in thi 
first on A with standard inst: 
Shapiro’s field reducer,’ final] 
structions 


a clox k 

BDRT 
same way 
then on B with 
on C with no-tilt in 


1 


ind unrestricted monocular 


uctions, 
vision 


Results 


Figure 2 shows the average degrees of ro- 
tation per design. As predicted for B, the field 
reducer condition, rotation increases for the 
whereas the brain 
injured show a decrease in rotation. All three 


groups show decreased rotation under no-tilt 


dull normals and normals 


instructions, but as expected, the brain in 
jured show the smallest improvement. 


The field reducer i mask fashioned from a 
table tennis ball which covers the eye. It permits 
central vision through a hole about 6 millimeters in 


diameter. 


Per Design 


Degrees Of Rotation 


9 L.. 
A 
Stonderd Instruction 
restricted Mor 


Vso 
Fic. 2. The effect of restricted peripheral vision 
and no-tilt instructions on rotation for controls, dull 
normals, and brain injured 
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rABLE 4 


IN DEGREES OF ROTATION 
Vision 


INSTRUCTION 


C HANGI 


REDUCED PERIPHERAL 


(;roup 


Normals 
Dull normals 


16.578 
25.460 


Brain injure¢ 23.958 


\ 
98 485 
254.401 


135.052 


Normals 
Dull normals 
Brain injured 


83,933 156.130 
480.944 


206.615} 


295 55" 
924.4409 


60.949 


Table 4 gives the basic data for estimating 
the effect of each condition. Under Condition 
A, with standard instructions and unrestricted 
monocular vision, the brain injured and dull 
normals have about the same amount of ro- 
tation. As expected, the normals have a sig- 
nificantly lower than the other 
groups. 


mean two 

The column labeled B-A measures the ef- 
fect of reducing peripheral vision. Both the 
normals and the dull normals show a slight in- 
crease in rotation, averaging about 1 or 2 de- 
grees per card, whereas the brain injured have 
a significant decrease in rotation. 

The column labeled K = B — C is equal, 
algebraically, to (B-A) (C-A) and there- 
fore is equivalent to adding the effect of re- 
duced peripheral vision and subtracting the 
effect of no-tilt instructions. The dull normals 
are significantly higher than the brain injured 
on this c-mbined measure of interaction. 

K has a statistically significant correlation 
of .37 with the dichotomous criterion, brain 
injured vs. dull normals. The multiple regres- 
sion of the criterion on the 
scores A, B, and C gave a multiple correla- 
tion of .42. This does not differ significantly 
from the .37 validity of K. In other words, 
the a priori function, K = B — C, is as good 
as the best empirical discriminating function. 
Neither function, however, is very useful for 


dichotomous 


diagnostic purposes. The best (i.e., maximum 


Peripheral Vision, and Set 


547 


likelihood) cutoff point yields only about 60% 
correct The 
for the diagnostic 


classification. average rotation 
shown in 
Table 3 do not differ significantly, nor was 
there a significant difference between the av- 


erages for disoriented and oriented patients. 


scores groups 


Disc ussion 


Rotation in brain injured and normals oc- 
curs intermittently much as would be antici- 
pated from sporadic spells of inattention. Di- 
recting the attention of normal subjects to 
tilt does reduce rotation to an amount close 
to the error of measurement, but brain in- 
jured patients improve only slightly. The 
paradoxical finding that reducing peripheral 
cues Causes normals to rotate more, but actu- 
ally improves the performance of the brain 
injured subject that relevant, pe- 
ripheral cues may cause orientation error in 
patients with brain injury. It may be in- 
ferred that there is a malfunctioning of the 
general integrating mechanism in the brain 
injured subject, such that relevant peripheral 
cues hamper performance by producing dis- 
torted perception. 

M. B. Shapiro (1952) hypothesizes that the 
greater rotation for brain injured subjects is 
due to an increase in cortical inhibition caused 
by trauma. Thus, the brain injured subject is 
rendered peripherally blind, and integration 
fails because the peripheral cues are not trans- 
mitted by the cortex. Therefore, Shapiro’s pre- 
diction would be that the field-reducer would 
have no effect on rotation for the brain in- 
jured subjects. The data of this and the previ- 
ous experiment (Williams et al., 1956) indi- 
cate, that the field-reducer facili- 
tates correct orientation by the brain injured. 

The results obtained by Strauss and Leh- 
tinen (1947) appear to be similar to ours. 
They state that brain injured subjects are 
easily distracted by stimuli; therefore reduc- 


implies 


however, 


ing the display to its essentials will improve 


performance. The field-reducer used in the 
present experiment may prevent peripheral 
stimuli from distracting the brain injured 
subject, thus enabling him to concentrate 
more effectively on the target. In place of 
Shapiro’s “inattention” hypothesis, we would 
substitute the notion that for the brain in- 
jured, relevant peripheral cues provide con- 
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fusing and distracting information about the 
visual frame of reference. 


SUMMARY 


Experiments were conducted to confirm the 
existence of two interactions of brain injury 
and experimental conditions on block design 
rotation: (a) Instruction to pay attention to 
the tilt of the reproduced designs resulted in 
a greater decrease of rotation for both nor- 
mal and dull normal controls than for the 
brain injured. (4) Restricting peripheral vi- 
sion increased rotaiion for normal and dull 
normal controls, but decreased it for the 
brain injured. Although the difference in pat- 
terns of performance for dull normals and 
brain injured was statistically significant, it 
was not great enough to furnish a basis for 
individual diagnosis. 

The results from this and previous experi- 
ments imply that the basic difficulty for brain 
injured subjects is not a failure of attention 
or peripheral blindness, but is a generalized 
defect of integration such that relevant pe- 
ripheral cues cause perceptual distortion. 
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A NOTE ON 


TIME OF 


FIRST RESPONSES 


IN RORSCHACH PROTOCOLS 


ALVIN G 


BURSTEIN ! 


University of Michigan 


It is conventional in preparing the sum- 
mary of a subject’s Rorschach performance 
for diagnostic purposes to compute the mean 
time of first response (T/1R) for the 
of 10 cards. The object is to obtain a measure 


series 


of central tendency—to estimate the “typical” 
T/1R for that subject—(a) so that 
ject’s typical T/1R can be compared with 


a sub- 


typical times for nosological groups and (db) 
so that the T for a 
particular subject can be compared with that 
same subject’s typical T/1R. Examples of the 
clinical value of such comparisons have been 
furnished by Beck (1949). Depressed and or- 
ganically deteriorated patients have a typi- 


1R ona specific card 


cally slower T/1R than do normals (a com- 
parison of the first type mentioned), while 
shock 
terms of a subject’s departure from his own 
typical T 
type). 


neurotic can often be identified in 


1R (a comparison of the second 
Although the general practice is to compute 


1R, it might be 
whether the median might not be more appro- 


the mean T well to consider 
priate. Since the population of response times 
can extend no lower than zero seconds but up 
to very high values, and since most response 
end, the 
population is skewed, and the mean and the 


times are clustered near the low 


median will not coincide. A choice between 
these two measures of central tendency could 
be based on the same arguments that favor 
the use of the median over the mean in de- 

1 The 
University of Chicago and to Sheldon 


author is indebted to S. J. Beck of the 
Korchin ol 
Michael Reese Hospital for their assistance in ob- 


taining access to the normal Rorschach protocols 


scribing the “average” American’s income 
the sensitivity of the mean to extreme values 
makes it appear preferable to have that fig- 
ure below which half the cases fall and above 
which half the cases fall, that is, the median 

The kind of distortion to which the mean 
T/1R may be subject is illustrated in a case 
Beck (1949, pp. 281-287). In 


evidence for neuroti 


reported by 
evaluating shock on 
several cards, Beck used, as a basis of com 
IR of 
seconds, but rather a corrected mean T/1R 
28 seconds 


parison not the overall mean T 65.1 
obtained by ignoring the three 


largest values. It should be noted that, be- 
cause it is less sensitive to extreme values, the 
median T/1R 


used without such correction. Since it is diffi- 


33 seconds—could have been 
cult in such cases to make the subjective 
judgment of which and how many extreme 
values to drop, the use of the median would 
appear advantageous. 

In an effort to supply some normative in- 
formation, the median T/1R was computed 
for 154 of the 157 protocols collected as a 
normative sample by Beck, Rabin, Theissen, 
Molish, and Thetford 
three were not available at 


(1950; the remaining 
the time of the 
analysis). These protocols had a mean median 
T/1R of 25.6 seconds as compared with a 
IR of 32.5 seconds. 

The really critical issue in deciding which 
measure of 


mean mean T 


central tendency to employ is 
what we wish to represent by that measure. 
It is characteristic of the median that it will 
represent the typical response time in the 
sense that exactly half of the subject’s re- 


sponse time will be shorter than the median 
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EGO STRENGTH 


A 


AND CONFLICT DISCRIMINATION: 


FAILURE OF 


REPLICATION 


JACK BLOCK 


f California, Berkele: 


Korman (1960) recently has reported a 


study wherein the latency of psychophysical 
judgments was found to be related to scores 
on Barron’s Es (ego strength) scale. Subjects 


found to 
discriminate more slowly than subjects scor- 
ing high. Further, as the objective difficulty 
of decision increased, this difference in 
latency of judgment was found to increase. 


scoring low on the Es scale were 


was 


These results were sought specifically, as one 
of the Es 
The general principle of validating a 
measure by relating it to very different 
index of the underlying construct is of course 


route to the construct validation 
scale 


a 


a worthy one and it is with reluctance that 
the present note introduces data which fail 
to confirm the Korman finding. 

In a previous study (Block & Petersen, 
1955) after which the Korman experiment in 
part was patterned, latency scores for both 
and difficult judgment-discriminations 
were available which are fully equivalent to 
the latency scores employed by Korman. Also 
available were scores for each of the 53 sub- 


easy 


jects on the Es scale. For the easy decision 
situation ‘and separately for the difficult de- 
cision situation, the 53 subjects were ordered 
with respect to their decision latencies. The 
Es scores of the fastest 25 subjects were then 
contrasted with the Es scores of the slowest 25 
subjects (the intermediate three subjects be- 
ing omitted for reason of computational con- 
venience). The fast deciders in an objectively 
easy decision situation had a mean Es score of 
50.96 with an SD of 4.02: slow deciders had 
a mean of 50.88 with an SD of 5.03. The fast 
ambiguous 
tion had a mean Es score of 
SD of 4.22; the slow deciders had a mean 
Es score of 51.16 with an SD of 4.88. Obvi- 


deciders in an objectively situa- 


50.72 with an 


5 


g 


ously, in this study there is no relation be 
tween Es scores and the ability to rapidly re 
solve discrimination conflicts. 

How may such an empirical discrepancy be 
understood? What factors may be contribut- 
ing to a finding of relationship in the one 
study and the absence of a relationship in 
the other? 

One immediately obvious consideration is 
that the samples employed in the two studies 
radically different. The Korman study 
used 47 psychiatric inpatients, all presumably 
with sufficient 


are 


and manifest 
chopathology to warrant commitment. The 
Block-Petersen study employed Air Force 
captains, all presumably individuals within 
the normal range of adjustment. The mean 
Es score for the Korman sample was about 
41, well below the Biock-Petersen sample 
mean of 51.32. The Es standard deviation of 
the Korman sample is 6.75, somewhat but 
not significantly higher than the SD of 4.54 in 
the Block-Petersen sample. These are impor 
tant differences for the psychological signifi- 
cance of a given Es score in the one sample 
may 


internal psy- 


5 


3 


not correspond to its meaning in the 
other sample. Simply at the quantitative level 
the differences in the Es means of the two 
samples suggest that the high Es scorers of 
the Korman sample were relatively low scorers 
when considered relative to the Block-Peter 
sen sample. 

It would be presumptuous to discuss the 
comparative merits of these two samples for 
an appropriate test of the Korman hypothe- 
Properly, many more samples should be 
studied so that a pattern of results and their 
converging implication may appeat 


sis 


It would 
seem clear, however, in this instance and in 
many more that doubtless could be recounted 
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that the characteristics of the sample being  Btock, J., & Perersen, P. Some personality corre 
studied must be recognized explicitly as modi- lates of confidence, cau ion, and speed in a de- 
fying in decisive ways the relationships ob- a oe alee J. abnorm. soc. Psychol, 1955, 
served (Block, 1955). To the plea of Camp- 
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BRIEF REPORTS 


ANALYSIS OF THE WISC 
AND EMOTIONALLY 


VINTON N 
tate Unit 


A diagnostic question, to which the clinical psy- 
chologist is often called contribute, is 
the differentiation of emotionally disturbed and 
brain damaged children with average or superior 


upon to 


intelligence. In this situation, it 


to er 


s not uncommon 
the Verbal 
the WISC 
is in the identification 
However s little empiri- 


nploy the discrepancy between 
the 


similar 


scale and Performance ile of 


ind indices as a 
of brain damage there 
cal evidence to support these particular uses of 


the WISC as 


The present investigation a1 


tec hn que 
alyzed the 


emotionally disturbed 


a diagnostic 
WISC 
performances of and brain 
charac- 
ern of perform- 


Verba ind 


between sub- 


damaged children with respect to such 


teristics as overall level and p 


ance, differences ween Perform- 


ance 


test 


1 , 
scaie quotient a ( lerences 


The 
7 


children (mean age 10.5 


means 


emotionally disturbed group con- 


sisted of years }) 


who had been seen in the Serv- 
This investigation was supported by a 
616 ir 


grant (B- 
Neurological Dis- 
Blindness, United States Public Health 
The writer is indebted Arthur L. Benton 


for aid in planning and t r tl udy 


ym the National Institut 


} 


eases and 


Service 


An extended report of this stud nay be obtained 
without charge from Vinton N 
of Psychiatry, 
Iowa o1 


(Department 
State University of Iowa; Iowa City 
the American Docu 
Order Document No. 6872 
Publications Project, Photoduplication 
Library of Washington 25, D. C 
remitting in $1.75 for microfilm or $2 
checks 


Library of 


for a fee fr 
mentation Institute 
ADI Auxiliary 


Service 


om 


Congress 
advance 50 


Make 


Photoduplication Service 


92 
for photocopies payable to Chief 


Congress 


PERFORMANCE OF 
DISTURBED CHILDREN 


BRAIN DAMAGED 


ROWLEY 
of Iowa 


ice 


because of 


University of Iowa Psychopathic Hospital 
behavioral maladjustment who 
had been diagnosed as emotionally disturbed with 


The 


dret 


and 


no evidence or history of cerebral disease 
I consisted of 30 chil 
age 10.6 who had been seen 
the Pediatrics Clinic, University Hospitals 
who showed unequivocal 


volving one or both cerebral hemispheres 


rain damaged group 


mean vears }) 
and 


evidence of disease in- 


In order to provide for as precise 
hl 


as possibile 


a compari- 


son certain restrictions were observed 


in the selection of 


subjects. The subjects were 


nat< hed \ ith respet t 
IQ to minimize 


individually 
WISC Full Scale 
these variables on performance patterns 

mal IQ of 83 was established in order to exclud 
defective children 

there was no 


h 


The essential findings were: (a) 
significant difference between the two groups wit 
either Verbal scale or 
scale IO: (b) Verbal scale-Performance scale re 
i the 


respect to Performance 


lationships were not significantly different in 
the profiles ot subtest 


two groups; (¢ scores 


the 
(d) none of 


two 


groups 
the 
a significant int 

The 
findings is that 


were not significantly different 
individual subtest scores showed 
‘rgroup difference 

general conclusion to be drawn from these 
when overall level of perform- 
ince is controlled, examination of the pattern of 
WISC performance does not provide a basis for 
differentiating nondefective brain damaged chil 
dren disturbed 


from nondefective 


children 


y tianalls 
emowuonany 
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SOCIAL DESIRABILITY IN THE RATINGS OF INVOLVED 
AND NEUTRAL JUDGES’ 


GEORGE LEVINGER 


Western Reserve University 


Research on personality rating devices has re- 
vealed a high positive correlation between prob- 
ability of item endorsement and the item’s per- 
ceived social desirability (e.g., Edwards, 1953, 
1957b). Recent studies have been mainly con- 
cerned with ascertaining the pervasiveness of this 
correlation, and with constructing instruments 
which would reduce this bias. The present report, 
dealing with one aspect of the problem from a 
somewhat different perspective, conceives of per- 
sonality ratings as the reflection of “true score” 
and error displacement. 

On the one hand, it is hypothesized that de- 
sirable traits are truly more common than un- 
desirable ones. It is also hypothesized that raters 
will distort their ratings in a desirable direction 
to the extent of their attraction toward the ob- 
ject rated. 

Data are based on a study of 31 family triads 
consisting of a father, mother, and 11-year-old 
child. As part of a larger study, check list ratings 
of family members were obtained from the par- 
ents and from either clinicians acquainted with 
the family or from teachers acquainted with the 
child. 

Regarding the first hypothesis, it was found 
that the item endorsement frequencies of all 
raters for all objects correlated positively with 
the SD scale values of the items (Edwards 
1957a), though not necessarily at a statistically 


1 An extended report of this study may be obtained 
without charge from George Levinger (Western Re- 
serve University; Cleveland, Ohio) or for a 
the American Documentation Institute 
ment No. 6873 from ADI Auxiliary 
Project, Photoduplication Service, Library of Con- 
gress; Washington 25, D. C., remitting in advance 
$1.25 for microfilm or $1.25 for photocopies. Make 
checks payable to: Chief, Photoduplication Service, 
Library of Congress 

The research was part of a larger study supported 
by a grant from the National Mental 
Health 


fee from 
Order Docu- 
Publications 


Institute of 


significant level. The correlations ranged from 
08, for the clinicians’ ratings of disturbed chil- 
dren, to .83, for the school parents’ descriptions 
of themselves. Such a finding is not surprising 
considering the lengthy socialization process in 
any human culture. It seems that an objective 
judge would tend to place almost any person 
above the neutral point of social desirability. 
Regarding the second hypothesis, parents’ rat 
ings of their children—and of themselves—were 
than those of the 
The findings, while limited 
to the correlational data mentioned above, tend 


consistently more favorable 
teachers or clinicians 


to support the hypothesis 

The findings are not in themselves novel. Yet 
their implication is that investigations of social 
desirability loadings 
to item 
selves with the nature of the 
tionship. 

For example 


should not limit their focus 
content alone, but concern them- 
judge-object rela- 


1 
also 


a disturbed person will probably 
describe himself less favorably than will a non- 
disturbed one. It would seem that this is not so 
much due to the former’s different interpretation 


of item content as to his unfavorable state of per-’ 
sonal self-attraction. And 


when one finds posi- 
tive changes in self-description among patients in 
psychotherapy, the may merely reflect 
changing attraction between subject and object 


scores 
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MEASUREMENT OF THE SEVERITY 


OF DISORDER IN SCHIZOPHRENIA 


BY MEANS OF THE HOLTZMAN INKBLOT TEST 


RICHARD A. STEFFY 


University 


Elgin Prognostic Scale ratings of behavioral 
and case history data were correlated with genetic 
level ratings (Becker, 1956) of Holtzman Inkblot 
responses on a sample of 36 Veterans Adminis- 
tration, hospitalized schizophrenics. A product- 
36 (p < .05) supported the pre- 
diction that subjects with poorer Elgin Prognostic 
Scalé scores tend to give more diffuse, undiffer- 
entiated, immature responses inkblot stimuli. 
Although the relationship between genetic level 
and Elgin ratings was not high as the one 
found by Becker (1956) using the Rorschach 
differences between samples 
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in duration of pres- 
ent hospitalization were shown to attenuate the 
relationship. Longer hospitalization leads to lower 
Elgin ratings on some scales (duration of psy- 


chosis, social withdrawal) and to improved func- 
tioning on inkblot tests (as precipitating stresses 
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are removed). Partialing out the effect of dura- 
tion of hospitalization increased the correlation 
between the Elgin scale and the genetic level 
scoring of the Holtzman to — .46. 

Additional analyses of the Holtzman test were 
made to explore the limits of its potential in this 
area. Although based on a small sample and need- 
ing cross-validation, an item analysis revealed a 
best subset of 13 Holtzman cards that correlated 
— .53 (and — .64 after partialing out duration of 
hospitalization) with the criterion 
The best linear combination of five 
Holtzman variables entering into the genetic level 
scoring system did nearly as well in predicting 
the Elgin scale as the pattern scores of the ge- 
netic level scoring system. It is concluded that 
more extensive studies of this type with Holtz- 
man test offer promise of producing good meas- 
ures of degree of pathology in schizophrenia. 
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An important area of interpersonal function- 
ing is the relationship between an individual and 
the family members with whom he is living. In 
conjunction with a study of the effectiveness of 
casework with relatives of hospitalized psychiatric 
patients a review of the literature failed to re- 
veal a scale appropriate for assessing, or measur- 
ing changes in, the patient-family relationship. 
This study reports the development of such scales 
and an evaluation of their reliability. 

The Lyons Relationship Scales consist of two 
parts: Schedule I, The Relationship from the 
Viewpoint of the Patient; and Schedule II, The 
Relationship from the Viewpoint of the Relative. 
They were designed to be used after an interview 
with the patient and an interview with the rela- 
tive. Each schedule consists of 13 items reflecting 
the description given by the patient of his rela- 
tive (or vice versa) with references to specific 
behavioral characteristics we thought of as being 
influential in determining the nature of a rela- 
tionship between close relatives. Specifically the 
items dealt with demandingness, consideration of 
mate’s views and feeling, interchange of ideas, 
adaptability, discussion of problems, sharing of 
work, degree of overprotectiveness, expression of 
affection, degree of hostility, money arrange- 
ments, and consideration of patient’s illness. In 
addition there is an item on each schedule con- 
cerning each relative’s global evaluation of the 
“goodness” of the relationship and one item 
which calls for the rater’s evaluation of the re- 
lationship based on an overall appraisal of the 
elicited material. 
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The reliability of these instruments was as- 
sessed by studying results obtained from Sched- 
ule I, The Relationship from the Viewpoint of 
the Patient. The assumption was made that since 
the items on Schedule II were analogous to those 
on Schedule I, ratings based upon interviewing 
a relative would be at least equally as reliable as 
those based upon interviewing a patient. Thirty- 
four patients were interviewed and independently 
rated by a panel of four social workers. 

The index Q (Hester, 1957 unpublished) was 
utilized to assess the reliability of the scale. Q 
differs from most indices of correlation ia that it 
takes into account not only the degree to which 
different raters agree on an item, but also the 
extent to which the item differentiates one re- 
spondent from another. Only two of the items 
(sharing of work and consideration of patient's 
illness) did not attain a satisfactory Q value. The 
item on adaptability was discarded since it was 
too often considered unrateable. Two others (dis- 
cussion of problems and money arrangements) 
were borderline and were tentatively retained 
pending further study 

As a further attempt to assess the reliability 
of the scale the statistic « (kappa) developed by 
Cohen (1960) was utilized. One of the major re- 
spects in which this statistic differs from Q is that 
rather than being based upon dichotomizing each 
variable, all the item rating points are used 
Analysis revealed moderate agreement for all the 
scale items except for the two that did not attain 
a satisfactory Q. 

We feel that these with face validity 
and demonstrated reliability, may be of value in 
that they constitute a measure of an important 
area of interpersonal relationships 
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A new projective technique 
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“The best in tests. .. .” 


Could you use a projective test elicit- 
ing a rich variety of unguarded re- 
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personality? Easily and quickly ad- 
ministered? Giving results of demcn- 
strated reliability and validity? For 
children as well as adults? 


KAHN TEST 
OF SYMBOL 
ARRANCE- 
MENT 


uses simple plastic objects arranged by | 
the examinee in various sequences ac- 

cording to several different sets of | 
easily-followed instructions. An ecx- | 
celient supplement to or substitute for | 
the TAT and Rorschach, combining | 
tke principal advantages of both. 





Obtain information and/or order (com- 
plete set $25.00) from 
PSYCHOLOGICAL TEST 
SPECIALISTS 
Box 1441 
Missoula, Montana 


“The best in tests. . . .” 


| 
| 
| 




















